ADA0378  15 


P-77-1 


QUESTIONNAIRE  CONSTRUCTION  MANUAL 


FORT  HOOD  FIELD  UNIT 


U 5 Aimy 


VDDC 


Research  Institute  for  the  Behavioral  and  Social  Sciences 


July  19  7 6 


< 

S' 


U.  S.  ARMY  RESEARCH  INSTITUTE 

FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 

A Field  Operating  Agency  under  the  Jurisdiction  of  the 
Deputy  Chief  of  Staff  for  Personnel 


J.  E.  UHLANER 
Technical  Director 


\\.  (..  MACS 

col,  t;s 

Comiiuiuier 


* \ r»k»ort  hat  mj.Jt  ! , ;1  •• 

•'  S Arrnv  I nit  • tut*  f< 

■ '•  • \ a tf> • 


iHJJ  fA  l • 


T ’ » « ..i  ••  n.#y  t>*  Cvit'GvrJ  Ahfr.  it  n ro  0 f « • :<M  P tin  t iiw  t « 

» • ■ «**»•  I 1 It  1 ». * i*  »or  th#  B*h#viOral  ino  $OC  4 Sc<«  *1 


* * •*  " * *-<  1 •’*  *Ot  to  Lit  «i»J  •!  #n  )f*  . 

T '-wr-xinti 


n la  -**“ 


Unclassified 

jCCURiTY  CLASSIFICATION  OF  THIS  PAGE  (Whin  Dili  Bntirid) 


/Cry'  REPORT  documentation  page 

READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 

2.  GOVT  ACCESSION  NO. 

3.  RECIPIENT'S  CATALOG  NUMBER 

IT  TI^LE  (and  Sub  till  a) 

jqUKSTIONNAIRE  CONSTRUCTION  MANUAL  / 

5 TYPE  OF  REPORT  & PERIOD  COVER  FD 

6.  PERFORMING  ORG.  REPORT  NULBER 

’’  *u  mo£ui..  , 

R.  F./Dyer,  J.  J.  Matthews,  C.  E.  Wright^Aaid  ' J. 
R.  L.^/fudowitch  j 

9.  CONTRACT  OR  GRANT  MifeBERfa) 

r • 

|)AHq_19-74-C-0032  /}'Xl 

9 PERFORMING  ORGANIZATION  NAME  AnO  ADDRESS 

Operations  Research  Associates 

Palo  Alto,  California  y j 

10.  PROGRAM  ELEMENT,  PROJECT.  TASK 
ARE  N & WORK  UNIT  NUMBERS 

. | ?Q763'731A"7‘f 

..  f ...  A ' 

II  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

1'RADOC  Combined  Arms  Test  Activity 
Fort  Hood,  Texas  ~(6%k 

/ 

)2>  REPORT -1>ATF~" 

fjulf  1-976  1 * 

■iT  touMUFR  6F  bAcas  , _ 

133  T-  „ J.r 

| 14  MONITORING  AGENCi  NAME  & ADDRESSpf  dlttirml  from  Controlling  Olllci) 

U.S.  Army  Research  Institute  for  the  Behavioral 
and  Social  Sciences 

ARI  Field  Unit-Fort  Hood  HQ  TOATA  (PERI -OH) 
Fort  Hood,  Texas  70344 

IS.  SECURITY  CLASS,  (of  Ihli  Tipm)  " 

Unclassified 

15k.  DECLASSI  FI  CATION/ DOWNGRADING 
SCHEDULE 

16.  DISTRIBUTION  STATEMENT  (o!  thim  Riporl) 


Approved  for  public  release;  distribution  unlimited 


DISTRIBUTION  STATEMENT  (of  tha  abatract  antarad  In  Block  20,  .1  different  from  Report) 


IS  SUPPLEMENTARY  NOTES 

Contracting  Officer's  Technical  Representative  was  George  M.  Gividen.  Chief. 
AK!  field  Office  at  Fort  Hood,  Texas.  Companion  volume  is  "Questionnaire 
Construction  Manual  Annex:  Literature  Survey  and  Bibliography." 

19  KEY  WOROS  (ConNnu*  on  row rma  aida  it  nacaaaary  and  Identity  by  block  n-anbar) 

(Question  construction  Instructional  manual  on  test  construction 

I c it  development 
Item  dkvelopmcnt 
u ■■st  ionna  ire  admj  ni  s t rat  on 


rCoitJuu*  a.*  rvw*1  ai.dk  It  n—  laaaary  and  Identity  by  block  mitarj 

This  manual  his  been  prepared  primarily  for  the  use  and  guidance  of  lii.  • 
who  are  tasked  to  develop  or  administer  questionnaires  as  part  of  Army  field 
tests  end  evaluations.  The  general  content  and  concepts  however,  should  b. 
useful  to  anyone  involved  in  constructing  or  administering  surveys,  interview 
o:  quest ionnai res . Chapters  .-1)  present  guidance  on  preparing,  assembling, 
aid  uranging  items  in  questionnaires.  Chapter  11  discusses  the  import's.  : 
m.i  procedures  for  pretesting,  and  Chapter  If  gives  respondent  character i s* n 


DO 


F OMt 

JAM  Ti 


1473 


EDITION  OF  > MOV  «»  IS  OBSOLETE 


Unc lass  if  d 


SECURITY  CL  ASSIFICATIOW  OF  THIS  PAGE  f 


Mxum  -- 


yiitotTh-  - aaiAWM trfidwfiYii  Mi 


at ' '•■fjf’riftflfr'r  j-.  wiiii  1 1 nifiii  ai  iimiraf  fat!  Tyml 


Army  Project  Number 
2Q763731A775 


TCA1A 

DAHC-19-74-C-003 


QUESTIONNAIRE  CONSTRUCTION  MANUAL 

Dr.  Robert  F.  Dyer 
Ms.  Josephine  J.  Matthews 
Dr.  Calvin  E.  Wright 
Dr.  Kenneth  L.  Yudowitch 
Operations  Research  Associates 


REVISED  BY: 

Dr.  Charles  0.  Nystrom,  ARI 


Submitted  by: 
George  M.  Gividen,  Chief 
Fort  Hood  Field  Unit 


July  1976 


Approved  by: 


Joseph  Zeidner,  Director 
Organizations  and  Systems 
Research  Laboratory 


J.  E.  Uhlaner,  Technical  Director 
U.J.  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences 


T/C  Page  1 
1 Jul  76 


TABLE  OF  CONTENTS 


I.  Introduction 

A.  Purpose  and  Organization  of  This  Manual 

B.  Definitiun  of  Questionnaire 

C.  Conventions  Used  in  This  Manurl 

D.  Keeping  This  Manual  Up  to  Date 

E.  Reporting  Problems  ard  Suggestions  for  Improvement 


II.  Major  Questionnaire  Types  and  Administration  Procedures 

A.  Overview 

B.  Types  of  Questionnaires  Discussed  in  This  Manual . 

C.  Ways  That  Questionnaires  Can  Be  Administered 

D.  Structured  Interviews  Versus  Other  Types  of  Questionnaires 


III.  Content  of  Questionnaire  Items 

A.  Overview 

B.  Determining  Questionnaire  Content  Preliminary  Research 

C.  Other  Considerations  Related  to  Questionnaire  Content 


IV.  Types  of  Questionnaire  Items 

A.  Overview 

B.  Open-Ended  Items 

C.  Multiple  Choice  Items 

D.  Rating  Scale  Items 

E.  Ranking  Items 

F.  Forced  Choice  Items 

G.  Card  Sorting  Items /Tasks 

H.  Semantic  Differential  Items 

I.  Other  Types  of  Items 


V .  Atticude  Scales  and  Scaling  Techniques 

A.  Overview 

B.  fhurstone  Scales 

C.  Likert  Scales 

D.  Guttman  Scales 

E.  Cti  er  Scaling  Techniques 


iii 


T/C  Page  2 
1 Jul  76 


VI .  Preparation  of  Questionnaire  Items 

A.  Overview 

B.  Mode  of  Items 

C.  Wording  of  Items 

D.  Difficulty  of  Items 

E.  Length  of  Question/Siem 

F.  Order  of  Question/Stems 

G.  Number  of  Response  Alternatives 

H.  Order  of  Response  Alternatives 


VI 1 . Response  Anchoring 

A.  Overview 

B.  Types  of  Response  Anchors 

C.  Anchored  Versus  Unanchored  Scales 

D.  Amount  of  Verbal  Anchoring 

E.  Procedures  for  the  Selection  of  Verbal  Scale  Anchors 

F.  Scale  Balance,  Midpoints,  and  Polarity 


VIII.  Empirical  Bases  for  Selecting  Modifiers  for  Response  Alternatives 

A.  Overview 

B.  General  Considerations  in  the  Selection  of  Response  Alternatives 

C.  Selection  of  Response  Alternatives  Denoting  Degree'  of  Frequency 

D.  Selection  of  Response  Alternatives  Using  Order  of  Merit  Lists  of 
Descriptor  Terms 

E.  Selection  of  Response  Alternatives  Using  Scale  Values  and 
Standard  Deviations 

F.  Sample  Sets  of.  Response  Alternatives 

IX.  Physical  Characteristics  of  Questionnaires 

A.  Overview 

B.  Location  of  Response  Alternatives  Relative  to  the  Stem 

C.  Questionnaire  Length 

C.  Questionnaire  Format  Cons idet at  ions 
E.  Use  of  Answer  Sheets 


X.  Considerations  Related  to  Questionnaire  Administration 

A.  Overview 

B.  Instructions 

C.  Anonymity  for  Respondents 

D.  Motivational  Factors 

E.  Administration  Time 

F.  Characteristics  of  Administrators 


iv 


T/C  Page  3 
1 Jul  76 


G.  Administration  Conditions 

H.  Training  of  Field  Test  Evaluators 

I.  Other  Factors  Related  to  Questionnaire  Administration 


XI . Pretesting  of  Questionnaires 

A.  Overview 

L.  Guidelines  for  Pre.c.2sting  Questionnaires 


XII.  Characteristics  of  Respondents  That  Influence  Questionnaire 
Results 

A.  Overview 

B.  Social  De.  .ability  and  Acquiescence  Response  Sets 

C.  Other  Response  Sets  or  Errors 

D.  Effects  of  General  Tretest  Attitudes  of  Respondents 

E.  Effects  of  Demographic  Characteristics  of  Responses 

XIII.  Evaluating  Questionnaire  Results 

A.  Overview 

B.  Scoring  Questionnaire  Responses 

C.  Data  Analyses 


XIV.  Interview  Considerations 


A.  Overview 

B.  Structured  and  Unstructured  Interviews 

C.  Interviewer's  Character  1st  •'vs  Relative  to  Interviewee 

D.  Situational  Factors 

E.  Training  Interviewers 

F.  Data  Recording  and  Reduction 

G.  Special  Interviewer  Problems 


ANNEX.  Literature  Survey  and  Bibliography 


v 


a 


LIST  OF  FIGURES 


Figure 

Title 

Section 

IV-B-1 

Examples  of  Open-Ended  Items 

IV-B 

1 

IV-C-1 

Examples  of  Multiple  Choice  Items 

IV-C 

2 

IV-b-1 

Examples  of  Numerical  Rating  Scale  Items 

IV-D 

1 

IV-D-2 

Examples  of  Graphic  Rating  Scale  Item 

IV-D 

2 

IV-E-1 

Examples  of  Ranking  Items 

IV-E 

2 

IV-F-1 

Examples  of  Forced  Choice  Items 

IV-F 

2 

IV-H-1 

Examples  of  Semantic  Differential  Items 

IV-H 

2 

IV-I-I 

Examples  of  Check  Lists 

JV-I 

1 

IV-I-2 

Examples  of  Formats  Providing  for  Supplementary 
Responses 

IV-I 

3 

VI-C-1 

Examples  of  Question  Form  and  Incomplete 
Statement  Form  of  Stem 

7V-C 

2 

VI-C-2 

An  Insufficiently  Detailed  Question  Stem, 
Plus  Revision 

VI-C 

4 

VI-C-3 

Examples  of  loaded  Questions 

VI -c 

7 

VI-C-4 

Examples  of  Leading  Questions 

VI-C 

3 

VI-C-5 

Example  of  a Question  Asking  the  Respondent  to 
Criticize 

VI-C 

9 

VI-C-6 

Examples  of  Double-Barreled  Questions  and 
Alternatives 

VI-C 

10 

VI-C-7 

Example  of  Ambiguous  Question  end  Alternative 

VI-C 

11 

VI-C-8 

Alternate  Ways  of  Expressing  Directionality 
and  Intensity 

VI-L 

13 

VI-D-1 

Example  of  Hard  to  Understand  Item  and 
Alternative 

VI  -D 

1 

VI-H-1 

Example  of  Rating  Scale  Ttem  with  Alternate 
Response  Alternatives  Order 

VI -H 

0 

4» 

VII-B-1 

Types  of  Response  Anchors 

VI I -B 

2 

Section 


Figure 

VII-F-1 

uuc 

Examples  of  Scale  Balance,  Midpoints,  and 
Polarity 

VII-F 

VIII-B-1 

Two  Formats  Using  "Outstanding"  and  'Superior 

VIII-B 

VIII-B-2 

Response  Alternatives  Frequently  Recommended 
by  ARI 

VIII-B 

IX-B-1 

Arrangement  of  Items  With  Same  Rating  Scale 
Response  Alternatives 

IX-B 

X-C-l 

A Second  Example  of  a Privacy  Act  Statement 

X-C 

LIST  OF  TABLES 


Table 

Tide 

Section 

Page 

VIII-B-1 

Words  Considered  Unrateable  by  Subjects 

VIII-B 

2 

VI1I-B-2 

Words  Exhibiting  Bimodality  of  Response 

VIII-B 

3 

VIII-B-3 

Sample  List  of  Phrases  Denoting  Degrees  of 
Acceptability 

VIII-B 

5 

VIII-B-A 

A second  Sample  List  of  Phrases  Denoting 
Degrees  of  Acceptabilitv 

VIII-B 

5 

VIII-B-5 

Neutral  Term  Scale  Values  and  Stancard 
L'-viations  as  Determined  by  Several  Different 
Studies 

VIII  A 

7 

vm-c-i 

Degrees  of  Frequency 

VIII-C 

1 

viii-d-i 

Order  of  Merit  of  Selected  Descriptive  Terms 

VIII-D 

1 

VII-D-2 

Order  of  Merit  of  Descriptive  Terms  Using 
"Use"  as  a Descriptor 

VI1I-D 

2 

VIII-E-1 

Acceptability  Phrases 

VITI-E 

2 

VIII-E-2 

Degrees  of  Excellence:  First  Set 

VII1-E 

3 

VIII-E-3 

Degrees  of  Excellence:  Second  Set 

VIII-E 

4 

viit-e-4 

Degrees  of  Like  and  Dislike 

VIII-E 

5 

VIII-E-5 

Degrees  of  Good  and  Poor 

VII T-E 

6 

VIII-E-6 

Degrees  of  Good  and  Bad 

VIII-E 

7 

VIII-E-7 

Degrees  of  Agree  and  Disagree 

VIII-E 

8 

VIII-L-L 

Degrees  of  More  and  Less 

VIII-E 

9 

VlII-E-9 

Degrees  of  Adequate  and  Inadequate 

VIII-E 

10 

VIII-L-10 

Degrees  of  Acceptable  and  Unacceptable 

vru-E 

11 

VIII-E-11 

Comparison  ^hrases 

VIII-E 

13 

VIII-E-1? 

Degrees  of  Satisfactory  and  Unsatisfactory 

VI I T-E 

14 

VIIT-E-13 

Degrees  of  Unsatisfactory 

VIII-E 

14 

viii 


'"fjj 


I 

i 


Table 

Title 

Section 

Page 

VIII-E-14 

Degrees  of  Pleasant 

VIII-E 

15 

VIII-E-15 

Degrees  of  Agreeable 

VIII-E 

15 

VIII-E-16 

Degrees  of  Desirable 

VIII-E 

16 

VIII-E-17 

Degrees  of  Nice 

VIII-E 

16 

VIII-E-18 

Degrees  of  Adequate 

VIIT-E 

17 

VIII-E- 19 

Degrees  of  Ordinary 

VIII-E 

17 

VIII-E-20 

Degrees  of  Average 

VIII-E 

18 

VIII-E-21 

Degrees  of  Hesitation 

VIII-E 

18 

VIII-E-22 

Degrees  of  Inferior 

vnr-E 

19 

VIII-E-23 

Degrees  of  Poor 

VIII-E 

19 

VIII-E-24 

Descriptive  Phrases 

VIII-E 

20 

VIII-F-1 

Sets  of  Response  Alternatives  Selected  so 
Phrases  are  at  Least  One  Standard  Deviation 
Apart  and  Have  Parallel  Wording 

VIII-F 

o 

4» 

VIII-F-2 

Sets  of  Response  Alternatives  Selected  so 
That  Intervals  Between  Phrases  are  as  Nearly 
Equal  as  Possible 

VIII-F 

4 

VITI-F-3 

Sets  of  Response  Alternatives  Selected  from 
Lists  Giving  Scale  Values  Only 

c 
>— < 

•-M 

1 

6 

VIII-F-4 

Sets  o:  Response  Alternatives  Selected  Usir.g 
Order  of  Merit  Lists  or  Descriptor  Terms 

VIII-F 

7 

lx 


®E**r:  MmS5XU5SZTOB>  '-TlBfjasaicSra;  !P?SRSB£JSai^'T!r^^f 


I -A  Page  1 
1 Jul  76 


f aapter  I;  Introduction 
A.  Purpose  and  Organization  of  This  Manual 

1.  Purpose 

This  manual  has  been  prepared  primarily  for  the  use  and  guidance 
of  those  who  are  tasked  to  develop  and/or  administer  question- 
naires as  part  of  Army  field  tests  and  evaluations,  such  as 
those  conducted  at  the  TPADOC  Combined  Arms  Tost  Activity 
(TCATA)  and  the  Combat  Developments  Experimentation  Command 
(CDEC) * The  general  content  and  concepts,  however,  are 
applicable  to  a variety  of  situations.  As  such,  the  manual 
should  prove  useful  to  all  individuals  involved  in  the  construc- 
tion and  administration  of  surveys,  interviews  or  questionnaires. 

2.  Organization 

Information  and  guidance  relating  to  the  preparation  of  items 
for  questionnaires  and  for  their  assembly  and  arrangement  into 
a complete  questionnaire  are  presented  in  Chapters  II  through  X. 
Chapter  Xi  discusses  the  importance  of,  and  procedures  for, 
pretesting  questionnaires  prior  to  their  regular  administration. 
Chapter  XII  discusses  characteristics  of  respondents  that 
influence  questionnaire  results.  The  analysis  and  evaluation 
of  responses  to  a questionnaire  are  briefly  dealt  with  in 
Chapter  XIII.  Finally,  a number  of  considerations  regarding 
the  presentation  of  questions  by  means  of  an  interview  are 
discussed  in  Chapter  XIV. 


^ * na.ru  na  nfi— n.  m 


IV.IIP'T  V “«•*'<?»  »t«ff>;i  a^T-iv 


mfoesann mn, 


I-B  Page  1 
1 Jul  76 


B.  Definition  of  Questionnaire 

As  used  in  this  manual,  the  word  "questionnaire"  refers  to  an 
ordered  arrangement  of  items  (questions,  in  effect)  intended  to 
elicit  the  evaluations,  judgments,  comparisons,  attitudes,  beliefs, 
or  opinions  of  personnel.  The  content  and  format  of  the  items  may 
vary  widely.  A visual  mode  of  presenting  the  items  is  employed. 

In  the  past,  this  meant  that  the  items  were  typed  or  printed  on 
paper,  but  now  items  can  also  be  presented  by  closed  circuit 
telavision  or  on  a cathode  ray  tube  under  the  control  of  a computer 
program.  If  the  items  are  first  read  by  an  interviewer  and  then 
given  verbally  to  the  respondent,  the  questionnaire  may  also  be 
termed  a "structured  interview."  Hence,  questionnaires  and 
interviews  have  some  common  properties.  Questionnaire  itemo  need 
to  be  responded  to  by  scribing  words  or  marks  with  a pen  or  pencil, 
but  this  aspect  too  has  been  enlarged  to  include  typed,  punched, 
and  verbal  responses. 

While  questionnaires  are  "data  collection  forms,"  not  all  data 
collection  forms  are  questionnaires.  Those  forms  used  oy  personnel 
to  enter  instrument  readings  or  to  record  their  counts  or  observa- 
tions (e.g.,  time  of  first  detection,  number  of  targets  correctly 
identified,  number  of  rounds  fired)  are  not  directly  addressed  in 
this  manual. 


I-C  Page  1 
1 Jul  76 


awwrsr'1 


C.  Conventions  Used  in  This  Manual 


1.  Identification  Scheme  Used 

This  manual  has  been  prepared  in  outline  form  to  facilitate 
cross-referencing  and  later  updating.  The  identification 
scheme  that  is  used  employs  Roman  numerals,  capital  and  small 
letters,  and  numbers  in  the  sequence:  I A 1 a (1)  [1]  la]. 

The  major  divisions,  I,  II,  III,  IV,  etc.,  are  called  chapters. 
All  other  subdivisions  arr  called  "sections,"  with  sections 
starting  with  capital  letters  (A,  3,  etc.)  called  "major 
sections."  You  are  now,  for  example,  reading  Section  I~C  1. 

To  facilitate  later  updating,  references  within  the  manual 
are  to  sections  and  not  pages. 

2.  Pagination 

Each  major  section  of  this  manual  (e.g.,  I-C)  starts  on  a new 
page,  and  pages  are  numbered  within  each  major  section.  For 
example,  this  is  Section  I-C  Page  1,  or  the  first  page  of 
Section  I-C. 

3.  Page  Update  Date 

Immediately  under  each  page  number  is  the  date  that  the  page 
was  drafted  or  revised.  When  a page  has  been  revised,  the 
date  of  the  immediately  previous  version  is  also  given  in 
parentheses  with  the  letter  "s"  meaning  "superseded."  For 
example,  if  I-D  Page  1 dated  3 Jul  76  is  revised  on  10  Oct  76, 
the  page  number  on  the  revised  page  would  appear  as: 

I-D  Page  1 
10  Oct  76 
(s.  1 Jul  76) 

4.  Table  and  Figure  Identification 


Both  tables  and  figures  are  numbered  sequentially  within  a 
major  section,  with  a hyphen  before  the  table  or  figure 
number.  Examples  are:  Table  VII1-B-1,  Table  VIII-B-2, 

Figure  VI-A-1. 


I-D  Page  1 
1 Jul  76 


mseams  r 


D.  Keeping  This  Manual  Up  to  Date 

1.  Updated  Pages  Should  be  Inserted  as  Received 

It  is  anticipated  that  sections  of  this  manual  will  be 
periodically  corrected,  revised,  or  otherwise  updated.  New 
pages  should  be  inserted  as  soon  as  they  are  received.  This 
will  not  only  keep  the  manual  up  to  date,  but  will  facilitate 
adding  pages  received  at  an  even  later  date.  Appropriate 
instructions  covering  which  pages  to  add  and  delete  will 
accompany  distributed  update  pages.  When  it  appears  useful, 
a list  will  also  be  provided  showing  the  page  numbers  and 
dates  of  all  pages  that  should  be  in  the  manual  at  that  time. 

2.  Request  for  Updates 

fo  be  placed  on  the  distribution  list  to  receive  updates  to 
this  manual,  write  to: 

Chief 

ARI  Field  Unit-Fort  Hood 
HQ  TCATA  (PERI -OH) 

Fort  Hood,  Texas  76544 


I-E  Page  1 
1 Jul  76 


E.  Reporting  Problems  and  Suggestions  for  Improvement 

As  previously  noted,  it  is  anticipated  that  this  manual  will 
periodically  be  updated  to  improve  its  utility.  To  report  errors, 
problems,  or  suggestions,  write  to: 

Chief 

ARI  Field  Unit-Fort  Hood 
HQ  TCATA  (PERI-OH) 

Fort  Hood,  Texa3  76544 


_ilu.  rsi  ini-fMiii^  oaaafcMy 


waapuiiN.  wss.wbjb 


II-A  Page  1 
1 Jul  76 


Chapter  II:  Ma.lor  Questionnaire  Types  and  Administration  Procedures 

A.  Overview 


This  chapter  briefly  summarizes  the  different  types  of  questionnaires 
discussed  in  this  manual  (Section  II-B)  and  ways  that  questionnaires 
may  be  administered  (Section  II-C) . Detailed  guidelines  regarding 
which  one  to  use  in  a given  situation  are  included  in  subsequent 
chapters.  Issues  to  consider  when  deciding  whether  to  use  a structured 
interview  of  some  other  type  of  questionnaire  are  presented  in 
Section  II-D,  which  also  notes  that  combinations  of  methods  may  be 
employed.  It  is  concluded  that  both  structured  interviews  and  other 
types  of  questionnaires  have  their  place,  and  both  have  limitations. 


II-B  Page  1 
i Jul  76 


B.  Types  of  Questionnaires  Discussed  in  This  Manual 

There  are  a number  of  techniques  of  data  collection  that  can  be  vised  to 
measure  human  attributes,  attitudes,  and  behavior.  Some  of  these  methods 
are  observation,  personal  and  public  records,  specific  performances,  so- 
ciometry,  interviews,  questionnaires,  rating  scales,  pictorial  techniques, 
projective  techuiques,  achievement  testing,  and  psychological  testing. 

For  this  manual,  however,  attention  has  been  restricted  to  a more  limited 
number  of  data  collection  techniques:  certain  paper-and-penci.l  types  of 

instruments  broadly  classed  as  questionnaires  as  defined  in  Section  I-A  2, 
and  including  only  some  of  the  techniques  mentioned  above.  A distinction 
has  also  been  made  in  this  manual  between  open-ended  questionnaire  items 
and  closed-ended  items.  Open-ended  items  are  those  which  permit  the 
respondent  to  express  his  opinions  in  his  own  words  and  to  indicate  any 
qualifications  he  wishes.  Closed-ended  items,  on  the  other  hand,  utilize 
response  alternatives,  such  as  multiple  choice  or  true-false.  Structured 
interviews  are  included  within  the  definition  of  questionnaires  used, 
since  typically  an  interview  form  is  developed  and  used  by  an  interviewer 
botn  for  asking  questions  and  recording  responses,  much  like  a self-ad- 
ministered questionnaire.  On  the  other  hand,  the  unstructured  interview 
makes  no  use  of  structured  data  collection  forms.  The  interviewer  is 
permitted  to  discuss  the  subject  matter  as  he  sees  fit  with  no  particular 
order  or  sequence.  Of  course,  other  interviews  fall  somewhere  between 
these  two  extremes.  In  any  case,  unstructured  interviews,  where  nc 
structured  response  forms  aro  used,  are  not  included  within  the  definition 
of  questionnaires  used  in  this  manual. 


II-C  Page  1 
1 Jul  76 


C . Ways  That  Questionnaires  Can  Be  Administered 

There  are  a number  of  respects  in  which  questionnaire  administration  may 
vary.  However,  in  the  usual  field  test  settings,  the  modal  questionnaire 
administration  situation  involves  paper-and-pencil  materials  with  the 
author/test  officer  administering  the  questionnaire  face-to-face  with 
a group  of  test  player^  or  evaluators. 

1.  Group  Versus  Individual  Administration 

Given  a printed  questionnaire,  calendar  time  is  saved  by  group 
administration.  The  task  of  statistical  analysis  can  be  initiated 
with  less  delay  than  if  one  were  waiting  on  a series  of  individual 
administrations.  An  important  determinant  of  group  vs.  individual 
is  the  time  at  which  people  complete  their  participation  in  the  test. 
Most  often  all  participants  are  through  at  the  same  time.  All  would 
be  available  for  questionnaire  administration  as  soon  as  they  could 
be  brought  to  an  appropriate  place  or  places.  Prompt  group 
administration  gives  the  same,  short  amount  of  time  for  forgetting 
about  test  events  to  those  who  become  the  respondents.  If  there  is 
an  administrator,  his  time  is  conserved  directly  in  proportion  to 
the  number  of  respondents  he  has  in  each  administrative  session. 

2.  Author-Administered  Questionnaires 

When  the  test  officer  or  administrator  who  is  familiar  with  the  content 
of  the  questionnaire  and  the  test’s  pruposes/objectives  can  admin- 
ister the  questionnaire,  some  advantages  can  be  gained.  The 
administrator's  instructions  and  appeals  may  increase  the  number  of 
respondents  having  desirable  motivation  to  complete  the  question- 
naire giving  appropriate  consideration  to  each  item.  If  one  employs 
a self-administration  procedure  such  as  might  occur  in  a mailed-out 
questionnaire  or  if  a poorly  prepared  sta  id-in  plays 
the  role  of  administrator,  then  the  respondents  must  derive  their 
instructions  and  30T.e  of  their  motivation  from  printed  instructions 
(or  from  the  pooily  prepared  stand-in).  More  things  usually  can 
end  up  going  wrong  when  questionnairees  are  self-administered  than 
when  they  are  administered  by  a test  administrator. 

3.  Remote  Administrations 


From  the  test  officer's  point  of  view  this  refers  to  a questionnaire 
administration  event  that  he  cannot  conduct  because  of  its  distance 
from  him  and/or  other  demands  on  his  time.  This  dimension,  remote 
versus  face-to-face,  is  similar  but  not  identical  to  the  previously 
noted  dimension,  self-administered  versus  author  administered. 


II-C  Page  2 
1 Jul  76 


To  avoid  the  possible  disadvantages  of  self-administered  question-  E 

naires,  the  test  officer  must  be  able  to  afford  another  administrator,  | 

train  him  in  the  knowledge  and  skills  associated  with  effective  I 

administration,  and  transport  him  to  the  "remote"  administration  | 

location.  If  multiple  administrations  having  location  or  timing 
differences  to  preclude  the  same  administrator  handle  them  are 

required,  it  would  appear  that  the  chances  are  increased  that  : 

more  respondents  will  experience  more  "difficulties"  in  answering 

the  questions.  ! 

4.  Other  Materiel  Modes  j 

While  providing  the  respondent  with  a printed  questionnaire  form 
and  a pencil  to  mark/write  his  responses  in  the  most  common 
questionnaire  administration  procedures  in  field  evaluations, 
other  presentation  modes  have  been  used.  In  a card-sorting 
procedure  that  has  been  used  with  individuals  and  groups,  each 
respondent  reads  statements  of  candidate  problems  and  then  pieces 
the  slip  in  one  of  "n"  piles  according  to  his  judgement  of  the 
severity  of  the  "problem".  Rarer  because  of  expense  and  logistics 
problems  is  the  setting  up  of  a computer  terminal  where  each  respondent 
enters  (types  in)  answers  to  questions  that  «,.;e  displayed  on  a 
cathode  ray  tube  (or  othur  computer  display  device). 

Chaper  XII  presents  many  other  considerations  related  to 
questionnaire  administration. 


II-D  Page  1 
1 Jul  76 


D.  Structured  Interviews  Versus  Other  Types  of  Questionnaires 
1.  Issues  to  Consider 


When  deciding  whether  to  use  a structured  interview  or  another 
type  of  questionnaire,  a number  of  issues  should  be  considered. 

Included  are  the  following: 

a.  If  a structure-’  interview  is  used,  there  must  be  enough 
qualified  interviewers  to  expeditiously  process  all  inter- 
viewees. Sometimes  there  are  only  a few  personnel  to  be 
interviewed,  or  there  is  plenty  of  time  available  for 
interviews,  so  only  one  or  two  interviewers  will  be  nec- 
essary. In  other  situations  maybe  only  an  hour  or  so  may 
be  available  per  interviewee;  in  these  cases  a large  number 
of  qualified  interviewers  must  be  available. 

b.  In  most  cases,  respondents  have  a greater  tendency  to  answer 
open-ended  questions  in  an  interview  than  when  response  is 
by  paper  and  pencil. 

c.  Paper-and-pencil  questionnaires  may  be  less  expensive, 
more  anonymous,  and  completed  faster  than  the  same  number 
of  interviews. 

d.  Respondents  seem  to  be  less  likely  to  report  unfavorable 
things  in  an  interview  than  in  an  anonymous  questionnaire. 
Typically,  questionnaires  are  also  more  likely  than  inter- 
views to  produce  self-revealing  data. 

e.  Issues  involving  socially  accentable  or  unacceptable 
attitudes  and  behaviors  will  elicit  more  bias  in  inter- 
viewee's responses. 

f.  During  interviews,  respondents  often  have  a tendency  to 
try  to  support  the  norms  that  they  .assume  tht  interviewer 
adheres  to. 

g.  Interviewers  with  biases  on  the  issues  under  discussion 
may  reflect  them  in  the  content  they  record  as  well  as 
in  what  they  fail  to  record. 


II-D  Page  2 
1 Jul  76 


h.  Although  a structured  Interview  using  open-ended  questions 
may  produce  more  complete  information  than  a typical 
questionnaire  containing  the  same  questions,  empirical 
research  seems  to  indicate  that  responses  to  the  typical 
questionnaire  are  more  reliable;  i.e.,  more  consistent. 

2.  Combinations  of  Methods 

There  arc  some  situations  where  a combination  of  methods  of 

questioning  might  be  used: 

a.  An  interview  might  be  used  to  obtain  information  for 
derigning  a paper-and-pencil  questionnaire. 

b.  Personal  interviews  or  telephone  interviews  might  be  used 
for  respondents  who  do  not  return  questionniares  admin- 
istered remotely  (such  as  mail  questionnaires). 

c.  When  respondents  are  unable  to  give  complete  information 
during  an  interview,  they  can  be  left  a copy  of  a question- 
naire to  complete  and  mail  in,  sc  that  the  necessity  for  a 
rall-back  is  eliminated. 

3.  Conclusion 


P5th  structured  Interviews  and  other  types  of  quest ionne ires 
appear  to  have  their  advancages  and  disadvantages.  The  choice 
of  which  to  use  may  well  depend  upon  costs,  which  are  generally 
lower  ror  the  typical  questionnaire.  The  typical  questionnaire 
is  apparently  more  reliable,  while  the  structured  interview 
may  provide  more  unique  and  more  abundant  information,  X f the 
dimensions  of  a problem  have  not  been  explored  before,  the 
best  compromise  would  appear  to  be  to  use  the  interview 
approach  with  open-ended  items  to  uncover  the  dimensions, 
and  follow  this  by  the  use  cf  the  paper-and-pencil  question- 
naire with  closed-end  Items  to  obtain  more  specific  inforwation. 


III-A  Page  1 
1 Jul  76 


Chapter  III:  Content  of  Questionnaire  Items 


A.  Overview 

The  recommended  general  steps  in  preparing  a questionnaire  include 
preliminary  planning,  determining  the  content  of  questionnaire 
items,  selecting  question  forms,  wording  of  questions,  formulating 
the  questionnaire,  and  pretesting.  As  part  of  preliminary  planning, 
the  information  requrired  has  to  be  determined,  as  do  procedures 
required  for  administration,  sample  size,  location,  frequency  of 
administration,  experimental  design  of  the  field  test,  and  analyses 
to  be  used.  Selecting  question  formo  is  a function  of  the  content 
of  the  questionnaire  items  and  requires  knowledge  of  types  of 
questionnaire  items  and  scaling  techniques.  The  wording  of  ques- 
tions is  the  most  critical  and  most  difficult  step.  Formulating 
the  questionnaire  includes  formatting,  sequencing  of  questions, 
consideration  of  data  reduction  and  analysis  techniques,  determin- 
ing basic  data  needed,  and  insuring  adequate  coverage  of  required 
field  test  data.  Pretesting  involves  using  a small  but  represen- 
tative group  to  insure  that  all  questions  are  understandable  and 
unambiguous . 

This  chapter  considers  the  content  of  questionnaire  items. 
Methods  for  determining  questionnaire  content  are  discussed  first, 
and  then  other  considerations  related  to  questionnaire  content 
are  presented.  The  other  steps  noted  above  arc  discussed  in 
subsequent  chapters. 


III-B  Page  1 
1 Jul  76 


sncsraMaeina  «jLtSOTe:'.?ssg.®n 


B . Determining  Questionnaire  Contenc  Preliminary  Research 


1.  Preliminary  Research 


If  you  have  the  job  of  developing  a questionnaire  for  a field 
test,  there  are  several  things  that  should  be  done  before  starting 
to  write  questionnaire  items. 


li 


a. 


Learn  the  test's  objectives.  Read  the  Outline  Test  Flan  in 
order  to  learn  what  it  says  the  test's  purpose,  scope,  and 
objectives  are.  All  data  collection  effort,  including 
questionnaire  administration,  should  be  consistent  with 
and  supportive  of  the  test's  objectives. 


I 


b., 

X 


What  performance  measures  are  planned  for  the  test?  One  may 
be  fortunate*nnough  to  be  involved  with  a test  for  which  the 
Detailed  Test  ilan  has  to  a large  extent  been  written.  Try 
to  discover  what  performance  measures/data  are  to  be  collected. 
If  performance  data  is  to  be  collected  on  some  aspects  of  the 
functioning  of  the  system  to  be  tested,  then  it  mry  not  be 
necessary  to  assess  these  functions  via  questionnaire  items. 


Consult  others  and  prior  test  plans  and  reports.  Many  tests 
at  CDEC  and  TCATA  (and  elsewhere)  follow-up,  or  are  similar  to, 
prior  testing.  As  a consequence,  information  may  be  readily 
available  regarding  prior  related  ov  similar  tests.  Test 
files  or  the  Technical  Information  Center  may  provide  a 
source  for  obtaining  test  planjp  and  reports  on  relevant 
prior  tests  conducted  by  Army  field  test /experimentation 
agencies . A 


* 


2 . Using  interviews  to  Determine  Questionnaire  Content 


If  one's  degree  of  experience  seems  meager  relative  to  the 
complexities  of  the  evaluation  problem,  he  may  employ  group 
and/or  individual  interviews  to  assist  in  determining  question- 
naire content.  Preferably  this  would  be  done  after  taking  the 
steps  noted  above.  The  less  one  knows  about  a subject,  the  less 
structure  one  can  impose  on  an  interview  dealing  with  the  subject. 


a. 


Conducting  an  unstructured  :roup  interview.  Personnel  are 
needed  who  have  relevant  operating  experience  with  the  system 
to  by  tested/evaluattd  - or  with  a sufficiently  similar  system. 
Arrange  a common  meeting  place  ..nd  time  with  about  five  to 
seven  of  them.  It  would  be  advantageous  to  have  a meeting 
place  that  was  not  cramped  for  spai e,  had  comfortable  chairs. 


II7-B  Page  2 
1 Jvl  76 


a comfortable  temperature,  and  where  all  discussants  were 
free  from  other  sources  of  distraction  (sights  and  sounds, 
mainly) . 

If  the  interviewer's  age  and  rank  are  several  steps 
above  or  below  the  age  and  rank  of  the  members  of  a homogeneous 
group  of  discussants,  try  (before  the  meeting)  to  get  a person 
who  is  their  contemporary  (peer)  xii  age  and  rank  to  lead  and 
coordinate  the  discussions.  Why?  Because  a mismatch  may  inhibit 
their  discussion  or  produce  too  much  submissive,  agreeing 
behavior  on  their  part. 

If  notes  are  being  taken  or  the  discussion  is  being  tape 
recorded  one  should  be  unobtrusive  about  it.  Don't  shove/ 
point  a microphone  at  a person  as  he  starts  to  speak.  He  may 
be  inhibited  by  this,  oi  he  may  become  a "ham". 

The  first  several  minutes  should  be  spent  in  establishing 
rapport  with  the  group.  The  purpose  of  the  session  should  be 
covered,  introduction  of  group  members  made,  and  other  warmup 
devices  used.  The  objective  is  to  motivate  as  many  respondents 
to  give  comments  as  possible.  In  the  remainder  of  the  session 
any  or  all  of  the  following  information-eliciting  devices 
could  be  used: 

(1)  Discuss  samples  of  the  control  item — ask  the  general 

question:  "What  problems  have  you  had  with  this  pie_e 

of  equipment  or  system?"  Follow  up  with  who,  what,  where, 
when  and  why.  Attempt  to  maximise  the  number  of  potential 
or  artual  problems  posed.  Strive  for  clarification  of 
problem  ideas,  but  do  not  criticize  the  comments,  even 
if  fhey  are  redundant  with  a previous  contribution  by 

(2)  Ask:  "What  do  you  consider  to  be  the  most  important 

features  (characteristics,  qualities,  etc.)  of  this 
equipment  or  system  when  used  in  the  field?"  Strive  to 
get  a multitude  or  adjectives  and  phrases  here  fe.g.  ease 
of  operation,  weight,  durability,  portability,  etc.) 

(3)  Use  the  aided  recall  technioue:  "'Jan  you  remember  where 

and  when  you  have  encountered  problems  with  this  system?" 
(e.g.,  at  night;  when  it's  damp,  etc.). 

The  recorded  comments  should  be  categorized  and  arranged 
by  frequency.  For  example,  how  many  of  the  comments  on  system 
operation  stressed  failure  considerations? 


III-B  Page  3 
1 Jul  76 


b.  Conduct  semistructured  personal  interviews.  As  a next  step, 
or  as  an  alternative  step  to  the  group  interview,  one  may 
employ  a small  number  of  representative  respondents  in  a 
person-to-person  interview  format.  Information  produced 
from  the  unstructured  group  interviews  provides  general 
guidance  to  the  specific  evaluative  information  desired. 

In  this  method  of  interviewing,  the  interviewer  is  given 
only  general  instructions  on  the  type  of  information  desired. 

He  is  left  free  to  ask  the  necessary  direct  questions  to  obtain 
this  information,  using  the  v?or:ing  and  the  order  that  seems 
most  appropriate  in  the  context  of  each  interview.  These 
interviews,  like  the  unstructured  group  sessions,  are  useful 
in  obtaining  a clearer  understanding  of  problems,  and  in 
determining  v.'hat  areas  (evaluation  criteria)  should  be 
included  on  the  final  questionnaire. 

The  only  structure  to  the  semistructured  interview  comes 
from  a set  of  question  categories  that  must  be  raised  sometime 
during  the  interview.  Questions  on  system  exnerience,  positive 
and  negative  iortures,  and  problems  r,i  field  use,  for  example, 
can  be  phrased  in  any  manner  or  .seqaence.  Probing  questions 
of  the  type:  "Why  dc  you  feel  that  way?",  "What  do  you  mean  by 

that  statement?",  and  "What  other  reasons  do  you  have?"  can  be 
utilized  until  the  interviewer  is  satisfied  that  he  has  the 
necessar”  information  considering  time  limitations,  data  require- 
ments, and  the  willingness  and  ability  of  the  respondents  to 
verbalize  their  views. 

In  the  semistructured  interview,  the  interviewer  has  some 
flexibility  in  formulating  and  asking  questions.  This  technique 
can,  therfore,  be  only  as  effective  in  obtaining  complete, 
objective,  and  unbiased  information  as  the  interviewer  is 
skilled  in  r'rmulating  and  asking  questions.  Thus  interviewers 
mav  have  to  he  trained  in  using  this  technique, 

c.  Develop  the  questionnaire.  The  use  of  the  unstructured  and 
semi-str Jctured  interviews  as  discussed  above  should  enable 
the  formulation  of  a questionnaire  to  obtain  evaluative 
information.  These  interviews  will  provide  guidance  to  the 
formulation  of  a sound  survey  instrument  in  the  following 
respects: 

(1)  A better  understanding  of  the  factors  or  criteria  which 
make  up  the  mental  set  of  individuals  in  evaluating 
systems  and  eouirment. 


III-B  Page  4 
1 Jul  76 

(2)  Some  idea  of  the  range  of  favorable  and  unfavorable 
opinions  toward  tha  svstem  for  each  factor. 

(3)  Tentative  knowledge  of  individual  and  group  differential 
opinions  toward  the  system  tested. 

Therefore,  before  drafting  the  formal  questionnaire,  the 
researcher  must  have  a feel  for:  question  categories  (e.g., 

problem  areas,  positive  aspects);  response  categories  (e.g., 
evaluative  factors);  and  the  type  of  system  operations  infor- 
mation which  is  needed  (e.g.,  in  evaluating  a new  helmet 
suspension  system,  does  respondent  wear  eyeglasses?). 

3.  Using  the  Critical  Incident  Technique  to  Determine  Questionnaire 
Content 


The  critical  incident  technique  consists  of  a set  of  procedures  for 
collecting  direct  observations  of  human  behavior  in  such  a way  as  to 
facilitate  their  potential  usefulness  either  in  solving  practical 
problems  or  in  developing  bread  psychological  principles.  The  tech- 
nique calls  for  collecting  observed  incidents  of  behavior  that  have 
special  significance  and  meet  systematically  defined  criteria.  It 
can  be  of  assistance,  therefore,  in  helping  to  determine  the  content 
of  items  to  be  included  in  a questionnaire. 

Although  there  are  a number  of  variations  in  the  critical  incident 
technique,  the  basic  procedure  consists  of  collecting  records  of 
specific  behaviors  related  to  the  topic  of  concern.  The  behaviors 
might  be  noted  by  observers,  or  individuals  can  be  asked  to  recall 
and  record  past  specific  behaviors  judged  to  provide  significant 
or  critical  evidence  related  to  the  topic  of  concern.  As  appro- 
priate, behaviors  related  both  positively  and  negatively  to  the 
area  of  concern  should  be  noted.  The  records  of  behavior  that 
are  collected  can  then  be  analyzed  and  used  as  a basis  for  deter- 
mining questionnaire  content. 

One  of  the  examples  of  the  use  of  the  critical  incident  technique 
reported  by  Flanagan  in  the  article  noted  in  Section  III-B  3,  had 
to  do  with  a study  of  combat  leadership  in  the  United  States  Army 
Air  Forces  in  1944.  It  represented  "the  first  large-scale,  system- 
atic effort  to  gather  specific  incidents  of  effective  or  ineffec- 
tive behavior  with  respect  to  a designated  activity.  The 
instructions  asked  the  combat  veterans  to  report  incidents  observed 
by  them  that  involved  behavior  which  was  especially  helpful  or 
Inadequate  in  accomplishing  the  assigned  mission.  The  statement 
finished  with  the  request,  'Describe  the  officer's  action.  What 
did  he  do?’  Several  thousand  incidents  were  collected  in  this  way 
and  analyzed  to  provide  a relatively  objective  and  factual 
definition  of  combat  leadership.  The  resulting  set  of  def.crintive 
categories  was  called  the  'cricicjl  renui rements 1 of  combat 
leadership"  (n.  328). 


III-B  Page  5 
1 Jul  76 

For  more  information  on  the  critical  incident  technique,  see, 
tor  example,  the  following  two  sources: 

a.  Barnes,  T.  I.  The  critical  incident  technique.  Sociology 
and  Social  Research,  1960,  44,  345-347. 

b.  Flanagan,  J.  C.  The  critical  incident  technique. 

Psychological  Bulletin,  1954,  51^,  327-358. 

4.  Using  Impressions  of  a Topic  to  Determine  Attitude  Scale  Content 

When  the  questionnaire  is  an  attitude  scale,  a useful  method  for 
selecting  items  for  it  is  to  ask  a group  of  individuals  to  write 
six  statements  giving  their  impressions  of  a topic,  such  as  Army 
pay.  From  these,  some  smaller  number  of  statements  can  be  selected 
that  are  readable,  intelligible,  and  capable  of  classification. 
These  statements  can  then  be  sorted  into  several  categories,  such 
as  the  status  cf  the  topic  and  its  good  and  bad  features. 


C.  Other  Considerations  Related  to  Questionn?  :.re  Content 


III-C  Page  1 
1 Jul  76 


This  section  discusses  a number  of  topics  related  to  questionnaire 
content:  questions  that  should  be  asked  related  to  questionnaire 

content;  sources  of  bias  in  questionnaire  construction;  and 
characteristics  of  good  questions  that  affect  questionnaire  content. 

1 . Questions  That  Should  Be  Asked  Related  to  Questionnaire  Content 

Asking  yourself  the  following  five  questions  may  lay  the  foun- 
dation for  a far  more  valuable  questionnaire  than  would  other- 
wise be  produced: 

a.  kTio  needs  the  information?  Knowledge  of  who  needs  the 
information  will  provide  a source  in  the  event  answers 
are  needed  to  the  following  four  questions. 

b.  What  decisions  will  be  made  based  on  your  information? 

This  will  tell  in  part  why  the  information  is  needed. 

Depending  on  what  decision  is  going  to  be  made,  some  kinds 
of  information  will  make  a difference  and  should  be 
collected,  and  other  kinds  wi.11  not. 

Supoose,  for  examole,  information  is  to  be  collected 
as  a part  of  a test  comparing  a new  item  of  equipment 
with  an  old  standard  item.  The  nature  of  the  decision 
to  be  made  is  clear  enough.  It  vTill  be  either  selection 
of  the  new  equipment,  or  retention  of  the  old  with  which 
it  is  being  compared.  The  basis  for  the  decision  will 
usually  also  be  clear.  From  the  small  development 
requirement  (SDR)  or  qualitative  materiel  requirement 
(OMR)  which  led  to  the  development  of  the  item  beinR 
tested,  Analysis  of  the  Q?T<  will  identify  the  eualitativo 
requirements  the  new  equipment  must  have,  and  will  give 
the  start  needed  to  develop  questions. 

c.  What  facts  will  affect  the  decision?  Mule  tills  may  be  a 
difficult  question  to  answer,  trying  to  do  so  should  'dentil v 
items  ot  information  that  should  be  sought  with  the  question- 
naire. It  may  also  head  off  the  collection  of  unnecessary 
information. 

u . Whom  ere  you  asking?  To  get  good  information,  not  only  must 
a good  question  be  askod,  but  it  must  be  asked  of  someone 
who  has  the  answer.  It  would  not,  tor  example,  be  reasonable 
to  ask  support  troops  in  a supplv  depot  questions  about  combat 
operations . 


III-C  Page  2 
1 Jui  76 


e.  What  are  the  consequences  of  a wrong  answer?  While  this  basically 
is  an  administrative  question,  it  has  an  important  bearing  on 
field  questionnaire  design.  Clearly,  if  it  makes  little 
difference  which  of  two  alternatives  are  chosen,  it  makes  little 
difference  if  the  information  is  collected.  On  the  other  hand, 
if  there  is  a chance  that  substantial  dollar  savings  will  result 
from  the  use  of  a more  effective  training  technique,  or  that 
millions  of  dollars  will  be  wasted  by  buying  a new  piece  of 
equipment  v/hich  is  not  butter  than  the  old,  it  is  necessary 
to  design  tests  very  well,  and  ask  I .he  right  questions  with 
great  care. 

2.  Sources  of  Bias  in  Questionnaire  Construction 

Two  primaty  sources  of  bias  in  questionnaire  construction  that 
have  been  identified  are  investigator  bias  and  question  bias. 

a.  Investigator  bias  arises  from:  choice  of  subject  matter; 

study  design  and  procedure;  unfair  or  loaded  phrasing  of 
questions;  and  interpretation  and  reporting  of  results. 

Sources  of  such  biases  include:  the  Questionnaire  developer's 

relationship  with  the  client;  his  personal  involvement  in  a 
particular  theoretical  position  or  research  technique;  and 
those  personal  traits  attributable  to  class,  race,  or 
political  ideology.  To  reduce  the  impact  of  such  bias, 
questionnaire  developers  need  to:  be  aware  of  the  problems; 

seek  critiques  from  independent  sources;  carefully  review 
previcuslv  nublished  related  reports;  and  continue  to 

pursue  technical  improvement  in  their  investigations. 

b.  Four  ways  that  have  been  suggested  of  minimizing  question 

bias  when  asking  opinion  questions  are:  ask  many  questions 

on  the  same  topic;  determine  bv  scale  analysis  whether 
questions  ask  the  resnondents  about  the  same  dimensions  of 
opinion  (see  Chapter  V):  ask  "How  stronglv  do  you  feel 
about  this?"  after  each  opinion  question,  and  relate  the 
content  of  opinion  to  the  intensity  of  feeling. 


IV-A  Page  1 
1 Jul  76 


Chapter  IV:  T/pes  of  Questionnaire  Items 


A.  Overview 


This  chapter  discusses  various  types  of  questionnaire  items: 
open-ended  items  (Section  IV-B) , multiple  choice  items 
(Section  IV-C) , rating  scale  items  (Section  IV-D) . ranking  items 
(Section  IV-E) , forced  choice  and  paired  comparison  items 
(Section  IV-F) , card  sorting  items/tasks  (Section  IV-G) , and 
semantic  differential  items  (Section  IV-H) . For  each  of  these 
major  item  types,  definitions  and  examples  are  presented, 
advantages  and  disadvantages  are  noted,  and  recommendations 
regarding  their  ust.  in  Army  field  test  evaluations  are  given. 
Other  types  of  items  are  noted  in  Seotion  IV-I:  check  lists, 

matching  items,  arrangement  items,  and  formats  providing  for 
supplementary  responses. 

It  may  be  noted  that  a number  of  ways  hav°  been  utilized  in 
the  professional  literature  for  differentiating  and  classifying 
item  types.  Which  types  are  special  cases  of  other  types  could 
be  debated  at  length.  Unanimous  agreement  with  the  definitions 
given  in  this  manual  cannot,  therefore,  be  anticipated. 


B.  Open-Ended  Items 


IV-B  Page  1 
1 Jul  76 


1.  Definition  and  Examples 

Open-ended  items  are  those  which  permit  the  respondent  to 
express  his  answer  to  the  questions  in  his  own  words,  and 
to  indicate  any  qualifications  he  wishes.  They  are  like 
general  questions  asked  in  an  unstructured  interview.  By 
contrast,  in  a clcsed-ended  item,  all  the  answers/choices, 
responses  permitted  are  displaced,  and  the  respondent  needs 
only  to  check  his  preferred  choice.  Examples  of  open-ended 
items  are  shown  in  Figure  IV-K--1. 


Figure  IV-B-1 

Examples  of  Open-Ended  Items 

1.  Describe  any  problems  you  experienced  in  moving  through  the 
test  course  while  wearing  the  new  PRC-99  radio  harness. 


2.  The  Ml 6 rifle  is: 


3.  What  do  you  think  of  the  AR-15  rifle  sight? 


2.  Advantages  of  Open-Er.ded  Items 

a.  Open-ended  items  allow  for  the  expression  of  middle,  opinions 
that  closed-ended  items  with  two  choices  would  not. 

b.  Open-ended  items  allow  for  the  expression  of  issues  ol  con- 
cern chat  may  not  have  been  identified  by  the  question  write 

c.  Open-ended  items  provide  unique  information. 

d.  Open-ended  items  are  very  easy  to  ask.  This  is  useful  when 
the  question  writer  either  does  not  know,  or  is  not  certain 
about,  the  range  of  possible  alternative  answers. 

e.  With  an  open-ended  question  it  is  possible  to  f i .id  out  what 
is  salient  to  the  respondent,  what  his  frame  of  reference 


IV-B  Page  2 
1 Jul  76 


is,  and  how  strongly  he  feu.1  s . 

f.  There  are  times  when  more  valiu  answers  may  be  obtained 
from  open-  than  closed-ended  items.  For  example,  there 
may  be  a tendency  for  resmndents  to  inflate  yearly 
income  figures.  Providing  response  alternatives  may 
result  in  an  even  greater  inflation. 

3 . Disadvantages  of  Open-Ended  Items 

a.  Open-ended  items  are  time  consuming  for  the  respondent. 

b.  A respondent  may  say  that  he  has  no  problem  rather  than 

fake  the  time  to  write  out  what  the  problem  is.  Item  3. 

in  Figure  IV-B-1  is  poor  in  this  respect,  but  item  2 is 

worse. 

c.  Open-ended  items  often  leave  the  respondent  on  his  own 
to  determine  vhat  is  relevant  in  evaluation.  For 
instance,  item  2 in  Figure  IV-B-1  leaves  the  respondent 
to  determine  what  is  relevant  in  evaluating  the  M16 
rifle.  This  is  inappropriate;  open-ended  questions  should 
not  be  used  to  bypass  the  understanding  of  operations 
that  the  questionnaire  writer  should  have  or  acquire 
before  he  prepares  the  final  version  of  the  questionnaire. 

d.  Questionnaires  that  use  closed-ended  items  are  generally 
inor*  reliable  than  those  using  open-ended  items. 

e.  Open-ended  questions,  answered  by  motivated  respondents, 
are  capable  of  overloading  data  analysts.  They  usually 
cannot  b<  handled  by  machine  analysis  methods  without 
lengthy  preliminary  steps.  Analysis  of  the  responses  to 
an  ope.»-onded  question  usually  must  be  done  by  someone 
who  has  substantial  knowledge  about  the  question's  con- 
tent, rather  than  by  a statistical  clerk.  They  are  often 
difficult  to  code  for  analyses.  Tims  the  data  3nalvsis 
problem  ?an  grow  into  a major  project  unless  s.ime  other 
form  of  question  is  used. 

f.  Open-ended  questions  mav  be  easier  to  misinterpret  since 
the  respondent  does  aot  have  a set  of  response  alternatives 
available  which  might  in  themselves  provide  the  proper 
frame  of  reference. 

g.  Much  of  the  material  obtained  from  an  onen-ended  question 
may  be  repetitious  or  irrelevant. 


IV-B  Page  3 
1 Jul  76 


h.  Open-ended  questions  are  subject  to  more  interviewer 
variations  than  closed-ended  questions. 

i.  Open-ended  items  are  often  harder  for  the  respondent  to 
answer  than  closed-ended  questions.  For  example,  a 
respondent  when  asked  his  annual  income  may  have  to 
struggle  to  come  up  with  a relatively  specific  figure, 
whereas  when  response  alternatives  ^e  presented  he  need 
only  indicate  one  of  a number  of  ranges  of  income. 

4.  Recommendations  Regarding  Use 

a.  Open-ended  questions  should  be  rarely  used  and,  even 
then,  such  questions  should  sharply  focus  the  respondent's 
attention  and  thereby  reduce  his  writing  burden. 

b.  Sometimes  a good  procedure  is  to  use  an  open-ended  question 
with  a small  number  of  respondents  as  a pretest,  in  order  to 
find  out  what  the  range  of  alternatives  is.  It  may  then  be 
possible  to  construct  good  closed-ended  questions  that  will 
be  faster  to  administer  and  easier  to  analyze. 

c.  Open-ended  questions  are  most  useful  when  there  are  too 
many  possible  responses  to  be  listed  or  forseen;  when  it 
is  important  to  measure  the  saliency  of  an  issue  to  the 
respondent;  or  when  a rapport-building  device  is  needed  in 
an  interview. 

d.  It  is  sometimes  useful  to  include  ar.  open-ended  question  or 
so  along  with  closed-ended  questions  in  order  to  obtain 
verbatim  responses  or  commer.tF  that  can  be  used  to  provide 
"flavor"  of  responses  in  a report. 


IV-C  Page  1 
1 .Tul  76 


C.  Multiple  Choice  Items 


1.  Oefi ’.iticn  and  Examples 


In  a multiple  choice  item,  the  respondent's  task  is  to  choose 
the  appropriate  or  best  answer  from  several  given  answers  or 
options.  As  used  here,  multiple  choice  items  include 
dichotomous  or  two-choice  items  as  special  cases.  And.  since 
the  permitted  answers  are  available  for  selection,  the 
multiple  choice  items  mav  also  be  termed  a ciosed-ended  .item. 


Examples  of  multiple  choice  items  are  shown  in 
Figure  IV-C-1.  T terns  3,  4,  and  5 are  dichotomour  or  two-way. 

A comparison  of  true-false  items  with  nondichotomous 
multiple  choice  items  is  made  in  Section  VI-G,  since  they  are 
issues  related  to  the  number  of  response  alternatr  es . 

Advantages  of  Multiple  Choice  Items 

As  seen  in  item  2 of  Figure  IV-C-1,  the  questionnaire 
writer  may  select  different  numbers  of  response  alter- 
natives depending  upon  his  knowledge  of  the  respondent's 
experience  or  depending  upon  his  decision  to  allow  or 
disallow  respondents  co  "sit  on  the  fence"  by  including 
a "no  preference"  alternative.  (See  Section  VI -C  for 
wording  of  items,  and  Section  VI-G  regarding  the  number 
of  response  alternatives  to  employ). 

Dichotomous  items  are  relative!'*  oas'*  to  develop,  and 
permit  rapid  analyses. 

Multiple  choice  items  are  easily  scored,  which  means  that 
data  analysis  is  a relatively  inexpensive  process  requiring 
no  snecial  content  expertise. 

Multiple  choice  items  require  coasi  1c  aolv  less  tine  per 
respondent  ans'  er  than  open-ended  items. 

Multiple  choice  items  nut  all  persons  on  the  same  footing 
when  answering.  That  is,  each  person  will  he  able  to 
consider  the  same  ran",'1  of  alternatives  when  choosing  an 
answe  r . 


Multiple  choice  items  are  eas-  to  udminis'.r. 


IV- C Page  2 
1 Jul  76 


'r 

1 

f 1 

Figura  IV-C-1 

j 

1 

Examples  of  Multiple  Choice  I tens 

1. 

What  do  you  consider  the  most  important  characteristic  of 

1 

a good  helmet?  (Check  one) 

I 

i 

Comfort 

Stability 

! 

1 

Utility  for  wash  basin 

1 

' 1 
1 
1 

Protection 

i 

, i 

: i 

Weight 

> 

: 2- 

Which  do  you  prefar,  the  M16  or  the  M14  rifle?  (Check  one) 

1 

i 

i 

Ml  4 

1 

1 

i 

i 

M16 

r 

1 

No  preference 

| 

: 3. 

! 

Were  you  able  to  fire  effectively  from  the  frontal  parapet 
emplacement? 

1 

• 

Yes  No 

i 

| 1 ** 

■ ! 

Which  do  you  prefer,  the  ABC  helmet  or  the  XYZ  helmet? 

s 

i 

ABC  helmet  * XYZ  helmet 

! 

i 

5. 

The  M16  is  a better  rifle  than  the  M14. 

True  False 

1 

! 6. 

What  is  your  marital  status? 

1 

1 

Single 

i 

l 

Mar. Jed 

Divorced 

1 

Other  fs.g.,  separated,  widowed,  etc.) 

! 

£ 

i 

IV-C  Page  3 
1 Jul  76 


3 . Disadvantaged  of  Multiple  Choice  Items  ' 

a.  Dichotomous  items  force  the  respondent  to  make  a choice 
even  though  he  may  feel  there  are  no  differences  between 
the  alternatives,  or  he  does  not  know  enough  about  either 
to  validly  choose  one.  Furthermore,  he  is  not  permitted 
to  say  how  much  better  one  alternative  is  than  the  other. 

b.  Two  alternatives  might  not  be  enough  for  some  types  of 
questions.  The  question  designer  may  oversimplify  an 
issue  by  forcing  it  into  two  categories. 

c.  There  may  be  a tendency  for  respondents  to  choose  an 
answer  on  the  basis  of  a response  set.  (See  Chapter  XII). 

d.  Unless  care  is  taken  in  the  construction  of  multiple 
choice  items,  the  response  alternatives  may  overlap. 

e.  The  question  maker  has  to  know  the  full  range  of  significant 
possible  alternatives  at  the  time  the  multiple  choice 

question  is  formulated.  \ 

\ 

f.  llutliple  choice  items  must  be  worded  with  very  great  care. 

Otherwise,  the  information  obtained  may  not  be  valid. 

g.  With  dichotomous  items  any  slight  language  difficulty  or 
misunderstanding  of  even  one  word  could  change  the  answer 
from  c '.e  extreme  to  another. 

4 . Recommendations  Regarding  Use 

a.  For  some  purposes  the  dichotomous  or  two-way  question  may  be 

an  improvement  over  the  open-ended  question  in  that  it  provides 
for  faster  and  more  economical  analysis  of  data.  However,  it 
requires  more  care  in  its  development. 

b.  Generally  speald.ig,  dichotomous  multiple  choice  questions  should 
be  avoided.  If  used,  tney  should  probably  be  followed  up  to 
determine  the  reason  for  a given  response. 

c.  Nondichotomous  multiple  choice  items  are  popular  and  have  wide 
utility.  They  are  recommended  for  general  use  as  appropriate. 


IV-D  Page  1 
1 Jul  76 


1 


ID.  Rating  Scale  Items 

1.  Definitions  and  Examples 

Rating  scale  items  are  a variation  of  multiple  choice  items. 
They  are  a means  of  assigning  a numerical  value  to  a person's 
judgment  about  some  object.  They  call  for  the  assignment  of 
objects  either  along  an  unbroken  continuum  or  in  ordered 
categories  along  the  continuum.  The  end  result  is  the  attach- 
ment of  numbers  to  those  assignments.  Ratings  may  be  made 
concerning  almost  anything,  including  people,  groups, 
ourselves,  objects,  and  systems. 

There  are  a number  of  different  forms  of  rating  scale 
items,  only  two  of  which  are  shewn  here.  Figure  IV-D-1  shows 
examples  of  "numerical"  scales.  In  item  1 a sequence  of 
defined  numbers  is  provided  for  the  respondent. 


Figure  IV-D-1 

Examples  of  Numerical  Rating  Scale  Items 

1.  The  cleaning  kit  for  the  H16  rifle  is 

7 very  eacy  to  use. 

6 q»*.ite  easy  to  use. 

5 fairly  easy  to  use. 

A borderline 

3 fairly  difficult  to  use. 

2 quite  difficult  to  use. 

1 very  difficult  to  use. 

2.  How  satisfied  or  dissatisfied  are  you  with  the  type  of  furni- 
ture in  the  barracks? 

Very  satisfied 

Satisfied 

Borderline 

Dissatisfied 

Vf"y  dissatisfied 

3.  The  training  that  I have  received  at  Fort  Hood  has  been 

very  challenging. 

challenging. 

borderline. 

_____  unchallenging. 

very  unchallenging. 


IV-D  Page  2 
1 Jul  76 


He  is  to  Indicate  which  defined  number  best  fits  his  judgment 
about  the  object  to  be  rated.  Sometimes,  the  numbers  arc  not 
shown  on  the  form  used  by  the  respondent  (e.g.,  items  2 and  3). 
Instead,  the  respondent  reports  in  terms  of  descriptive  cues 
and  the  numbers  are  attached  later  during  analysis.  The  num- 
bers assigned  are  in  an  arithmetic  sequence,  such  as  5,  4,  3, 

2,  1,  depending  upon  the  number  of  response  alternatives  used. 
They  are  usually  assigned  arbitrarily  unless  the  response 
alternatives  have  been  scaled  using  one  of  the  procedures 
described  in  Section  V-B.  The  order  of  perceived  favorableness 
of  commonly  used  words  and  phrases  is  discussed  in  Chapter  VIII. 

Figure  IV-D-2  shows  an  example  of  a graphic  rating  scale. 

In  the  graphic  scale,  the  descriptors  are  associated  with  points 
on  a line  or  graph,  and  the  respondent  indicates  his  judgment  by 
marking  the  point  on  the  line  which  best  firs  his  rating  of  the 
object.  The  line  can  be  either  horizontal  or  vertical..  The 
graphic  scale  allows  the  respondent  to  place  his  judgment  any 
place  on  the  line,  and  thus  he  is  not  confined  to  discrete 
categories  as  he  is  with  the  numerical  scale.  It  is,  however, 
more  difficult  to  scorv  but  this  can  be  facilitated  ,7ith  a 
stencil  which  divides  the  line  into  segments  to  which  numbers 
are  assigned. 

The  number  of  response  alternatives  to  use  is  discussed  in 
Section  VI-G,  the  order  of  response  alternatives  in  Section  VI-H, 
and  response  anchoring  in  Chapter  VII. 


Figure  IV-D-2 

• Example  of  Graphic  Rating  Scale  Item 

1.  Flfice  an  X at  the  point  on  the  scale  that  most  clearly  repre- 
sents your  opinion  about  the  cleaning  kit  for  the  M.i6  rifle. 


0 

(0 

0 

3 

« 

« 

3 

05 

O 

3 

0) 

u 

O 

« 

w 

4J 

O 

0 

05 

3 

4J 

■U 

» 

s 

s 

O 

3 

t— ^ 

U 

o 

u 

o 

3 

rH 

o 

U 

•H 

O 

3 

SN 

•H 

O 

U) 

V 

•H 

►» 

« 

n 

•H 

* A 

w 

« 

« 

•H 

iU 

0 

<u 

"3 

•H 

0 

>> 

u 

T3 

0 

fH 

o 

H 

V 

u 

* 

•u 

u 

X 

u 

•H 

u 

•H 

•H 

U 

V 

3 

0 

0 

0 

3 

V 

> 

Or 

U* 

CQ 

u< 

o • 

> 

LJ L 

J . i J... 

1 

J 1 ' 

{,  - J 

J 

IV-D  Page  3 
1 Jul  76 


2.  Advantages  of  Rating  Scale  Items 

a.  When  properly  constructed,  the  rating  scale  reflet ts  both  the 
direction  and  degree  of  attitude  or  opinion,  and  the  results 
are  amenable  to  analysis  by  conventional  statistical  tests 
(means,  standard  deviations,  etc.). 

b.  Graphic  rating  scales  allow  for  as  fine  a discrimination  as 
the  respondent  is  capable  of  giving,  and  the  fineness  of 
scoring  can  be  as  great  as  desired. 

c.  Rating  scale  items  usually  take  less  time  to  answer  than  do  other 
type  of  items. 

d.  Rating  scale  items  can  be  applied  to  almost  anything. 

e.  Rating  scale  items  are  generally  more  reliable  than  two-way 
multiple  choice  items.  They  may  be  more  reliable  than 
paired  comparisons  items. 

3.  Disadvantages  of  Rating  Scale  Items 

a.  Rating  scale  items  are  more  vulnerable  to  biases  and  errors 
than  other  types  of  items  such  as  forced  choice  items. 

b.  Graphic  rating  scales  are  harder  to  score  than  other  types 
of  items, 

c.  The  results  obtained  from  the  use  of  : rating  scale 

items  may  imply  a degree  of  precision ‘accuracy  which  is 
unwarranted. 

4.  Recommendations  Regarding  Use 

The  use  of  rating  scale  items  is  high1.;'  recommended  for  no*. t 

questionnaires. 


IV-E  Page  1 
1 Jul  76 


E.  Ranking  Items 

1.  Definition  and  Examples 

Ranking  items  call  for  the  respondent  to  indicate  the  relative 
ordering  of  the  members  of  a presented  group  of  objects  on  some 
presumably  discriminable  dimension,  such  as  effectiveness, 
saltiness,  overall  merit,  etc.  By  definition  one  does  not  have 
a scale  by  which  the  amount  of  difference  between  successive 
members  is  measured,  nor  is  it  implied  in  rank  ordering  that 
successive  differences  are  even  approximately  equal.  If 
respondents  were  being  asked  to  give  judgments  on  the  size  of 
intervals,  the  item  would  be  something  more  than  a ranking  item. 

Multiple  choice  items  are  so  frequently  used  that  one  may 
inadvertently  use  this  format  when  the  ranking  item  format  would 
provide  more  complete  and  reliable  information.  Item  1 in 
Figure  IV-C-1  illustrates  this  point.  Since  a preponderance  of 
respondents  would  check  "protection"  as  a helmet's  most  important 
characteristic,  only  a small  remainder  of  responses  would  be 
available  as  a basis  for  ordering  the  other  characteristics. 

Some  of  the  other  characteristics  might  be  achievable  without 
sacrificing  protection,  so  it  would  be  desirable  to  have  a 
reliable  ordering  of  their  importance. 

As  the  number  of  objects  to  be  ranked  increases,  the  dif- 
ficulty of  assigning  a different  rank  to  each  object  increases 
even  faster.  This  means  that  reliability  (repeatability)  is 
reduced.  To  counter  this,  one  mya  explicitly  permit  respondents 
to  assign  tied  rankings  to  objects  when  the  number  of  objects 
exceeds,  say,  10  or  more. 

Examples  of  ranking  items  are  shown  in  Figure  IV-E-1. 

2 • Advantages  of  Ranking  Items 

a.  The  idea  of  ranking  is  familiar  to  respondents. 

b.  Ranking  takes  less  time  to  administer,  score,  and  code  than 
paired  comparisons  items  do,  and  there  is  some  evidence  that 
the  results  of  the  two  have  a linear  relationship. 

c.  Ranking  and  rating  technique.',  are  generally  comparable. 


s 


IV-E  Page  2 
1 Jul  76 


Figure  IV-E-1 
Examples  of  Ranking  Items 

1.  Rank  tha  following  three  methods  of  issuing  starlight  scopes  to 
an  infantry  squad.  Assign  a "1"  to  the  most  effective,  a "2”  to 

the  second  most  effective,  etc.  Do  not  assign  tied  rankings. 

Ranking  Basis  of  Issue 

Scopes  issued  to  AMG  and  SL 

Scopes  issued  to  AMG,  SL,  and  one  rifleman 

Scopes  issued  to  all  squad  members 

2.  How  important  are  each  of  the  following  factors  to  you?  Assign 

a "1"  to  the  most  important,  "2"  to  the  second  most  important, 
etc.  Assign  a different  number  to  each  of  the  four  factors. 

Type  of  furniture  in  the  barracks 

Army  pay 

Medical  service  to  soldiers 

Choice  of  duty  station 


3.  Disadvantages  of  Ranking  Items 

a.  Ranking  items  such  as  item  1 in  Figure  IV-E-1  do  not  reveal 
the  respondent's  judgment  as  to  whether  any  of  the  objects 
are  effective  or  Ineffective  in  an  absolute  rather  than  just  a 
relative  sense.  To  learn  this,  another  question  must  be  asked. 

b.  Rank  order  items  do  not  permit  respondents  to  state  the 
relative  amounts  of  differences  between  alternatives. 

c.  The  results  from  ranking  items  are  open  to  question  if  the 
basis  for  ranking  was  not  clear  to  the  respondents. 

d.  Ranking  is  generally  less  precise  than  rating. 

4.  Recommendations  Regarding  Use 

There  aro  some  situations  where  the  intent  of  the  questionnaire 

developer  is  best  served  with  the  use  of  one  or  more  ranking  items. 

Generally,  however,  rating  scale  items  are  probably  preferable. 


Forced  Choice  Items 


IV-F  Page  1 
1 Jul  76 


1 . Definition  and  Examples 

It  would  appear  that  any  multiple  choice  item  could  also  be 
called  a "forced  choice"  item  because,  afterall,  the  respondent 
is  expected  to  choose  one  of  the  response  alternatives.  The 
instructions  and/or  the  presence  of  an  administrator  put  some 
degree  of  social  pressure  - social  force  - on  the  respondent. 
However,  if  a multiple  choice  item  includes  an  "1  don't  know" 
response  alternative,  the  pressure/force  is  almost  totally 
removed.  Likewise,  on  a rating  scale  item,  the  inclusion  of  a 
"neutral"  or  "borderline"  response  category  allows  the 
respondent  to  answer  without  committing  himself. 

So,  for  some  questionnaire  developers  - in  particular  those 
who  produce  "forced  choice  self  inventories"  (see  references)  - 
a "forced  choice"  item  strictly  refers  to  one  where  the  respondent 
must  commit  himself  or  herself.  He  may  have  to  select  one  of  a 
pair  of  choices,  or  two  of  three,  or  two  of  four.  These  three 
cases  are  illustrated  in  Figure  IV-F-1. 

2.  Advantages  of  Forced  Choice  Itans 

a.  Studies  have  indicated  that  the  reliability  and  validities 
obtained  from  the  use  of  forced  choice  items  compare  favor- 
ably with  other  methods. 

b.  Studies  have  also  shown  that  forced  choice  items  are  more 
resistant  than  other  items  to  the  effects  of  bias. 

c.  The  forced  choice  method  has  been  used  by  a number  of  inves- 
tigators in  an  attempt  to  control  the  tendency  of  individuals 
to  answer  self-renort  inventories  in  terms  of  response  sets 
rather  than  giving  "true"  responses.  (Response  sets  are 
discussed  in  Chapter  XTT.) 

3 . Disadvantages  of  Forced  Choice  Items 

a.  Respondents  sometimes  balk  at  picking  unfavorable  statements, 
or  at  being  forced  to  make  a choice. 

b.  Forced  choice  items  take  snore  time  tc  develop  than  do  other 
types  of  items. 

c.  Paired  comparisons  iters  where  all  phrases  arc  paired  lake 
more  time  to  administer,  score,  and  code  than  do  ranking  items. 
Results  from  the  two,  however,  may  have  a linear  relationship. 


Figure  IV-F-1 

Examples  of  Forced  Choice  Items 


IV-F  Page  2 
i Jul  76 


1.  Check  the  one  of  the  following  two  statements  that  is  mors 
Characteristic  of  what  you  like. 

I like  to  travel. 

I like  to  meet  new  people. 

2.  Check  the  one  of  the  two  following  statements  that  is  more 
characteristic  of  yourself. 

I am  honest. 


I am  Intelligent. 

3.  Look  at  the  following  three  activities.  Mark  an  "M"  by  the 
one  you  like  the  most,  and  an  "L"  by  the  one  you  like  the 
least. 


Play  baseball  j 

Go  to  the  craft  shops  ! 

Attend  boxing  or  wrestling  matches 

! 

4.  From  the  following  four  statements  check  the  two  that  ore  { 

most  descriptive  of  your  unit  commander.  j 

i 

I 

Serious-minded  ! 

i 

Energetic 

t 

Very  helpful 

Gets  along  well  with  others 


d.  There  is  some  question  as  to  whether  forced  choice  items 
overcome  the  biases  or  errors  they  are  supposed  to  correct. 

e.  Some  investigators  have  concluded  that  the  generalization 
that  self-report  forced  choice  inventories  are  more  valid 
than  single  stimulus  forms  of  the  same  tests  is  not  supported 
by  a critical  consideration  of  the  relevant  evidence. 


IV-F  Page  3 
1 Jul  76 


Procedures  for  constructing  forced  choice  items,  and  evaluative 
comments  about  then,  can  be  found  in  a number  of  sources  includ- 
ing the  following: 

a.  Guilford,  J.  P.  Psychometric  methods  (2nd  ed.).  New  York: 
McGraw-Hill,  1954. 

b.  Nunally,  J.  C.  Psychometric  Theory.  New  York:  McGraw- 

Hill,  1967,  pp  484-435. 

c.  Sisson,  E.  D.  Forced  choice — the  new  Army  rating.  Personnel 
Psychology,  1948,  1,  365-381. 

4.  Recommendations  Regarding  Use 

When  test  participants  are  deliberately  given  relevant  experience 
with  the  operation  of  a weapons  system,  vehicle,  or  other  system, 
the  "I  don't  know"  response  alternative  should  normally  be  deleted 
from  items  that  seek  the  participants'  evaluations  of  the  system. 


IV- G Page  1 
l Jul  76 

G.  Card  Sorting  Items/Tasks 

1.  Definition 

With  card  sorting  items/tasks,  the  respondent  is  given  a large 
number  of  statements  (e.g.,  75),  each  on  a slip  of  paper  or 
card.  He  is  asked  to  sort  them  into,  say,  nine  or  eleven 
piles.  The  piles  are  in  rank  order  from  "most  favorable"  to 
"least  favorable"  or  "most  descriptive"  to  "least  descrip- 
tive", etc.,  depending  upon  the  dimension  to  be  used.  Each 
pile  usually  is  Co  have  a specified  number  of  statements 
placed  into  it  as  required  to  form  a rough  normal  distri- 
bution. However,  some  investigators  have  argured  that 
forcing  a given  distribution  is  not  necessary.  Ordinarily 
each  pile  is  given  a score  value  which  is  then  assigned  to 
the  statements  placed  into  it. 

An  extensive  discussion  of  the  use  of  card  sorts  (or,  more 
generally,  Q-technique  and  its  methodology)  appears  in: 
Stephenson,  W.  The  study  of  behavior.  Chicago:  University 

of  Chicago  Press,  1953. 

2.  Advantages  of  Card  Sorting  Items/Tasks 

a.  Card  sorts  appear  to  be  capable  of  counteracting  at  least 
some  of  the  biasing  effects  of  response  sets.  (Response 
sets  are  discussed  in  Chapter  XU.) 

b.  Some  investigators  believe  that  card  sorting  is  a fast  and 
interesting  method  of  obtaining  valid  and  reliable  inter- 
view drta. 

c.  With  card  sorts  the  respondent  can  shift  items  back  and 
forth  if  he  wishes  to  do  so. 

d.  The  card  sort  has  greatest  value  when  a comprehensive 
descript  ion  of  a single  individual  is  desired. 

e.  Card  sorts  also  have  value  for  obtaining  complex  descrip- 
tions which  car  be  compared  systematically. 

f.  They  can  be  used  to  obtain  rating  information  on  any  issue. 

3 . Disadvantages  of  Card  Sorting  Items/Tasks 

a.  Card  sorting  items/tasks  may  take  more  time  to  construct 
than  other  types  of  items,  and  they  generally  take  mere 
time  to  administer  and  score. 


IV-G  Page  2 
1 Jul  76 


b.  Card  sorts  are  more  involved  to  administer  than  other 
types  of  questionnaire  items. 

4.  Recommendations  Regarding  Use 

Some  authors  think  that  card  sorting  is  the  method  of  choice 
if  testing  time  is  available.  Its  greatest  value  seems  to  be 
its  ability  to  provide  a comprehensive  description  of  a single 
individual,  or  to  obtain  complex  descriptions  which  can  be 
systematically  compared.  Since  it  is  more  awkward  to  administer 
and  score  than  other  types  of  items,  its  use  in  Army  field  test 
evaluations  is  limited. 


H.  Semantic  Differential  Items 


IV-H  Page  1 
1 Jul  76 


1.  Definition  and  Ey.amp  Lea 

The  semantic  differential  technique  was  initially  developed  as  a 
general  method  of  measuring  meaning,  and  with  it  the  meaning  of  a 
particular  concept  to  a particular  individual  can  be  specified 
quantitatively.  The  technique  has  also  been  used  to  measure 
attitudes  and  values,  particularly  in  the  marketing  area.  In 
using  the  technique,  the  respondent  is  presented  with  a number  of 
bipolar  rating  scales,  usually  but  not  always  with  seven  points. 

The  extreme  of  each  scale  is  defined  by  an  adjective.  The  respondent 
is  given  a set  of  such  scales  and  is  asked  to  rate  each  of  a number 
uf  objects  or  concepts  on  every  scale.  To  aid  in  interpretation, 
some  coding  scale  can  be  used,  usually  numbers  in  a direct  numerical 
sequence  such  as  1 through  7.  Other  more  extensive  scoring  can  be 
used,  and  results  can  be  factor  analyzed  to  search  for  the  basic 
dimensions  of  meaning.  However,  the  usefulness  of  the  semantic 
differential  as  a research  tool  stems  from  the  ability  of  the 
procedure  to  probe  into  both  the  content  and  the  relative  intensity 
of  respondents'  attitudes. 

Examples  of  semantic  differential  items  are  given  in  Figure  IV-H-1. 
A recommended  text  on  the  semantic  differential  is  Osgood,  C.  E. , 

Suci,  G.  J.,  & Tannenbaum,  P.  H.  The  measurement  of  meaning.  Urbana, 
111.,  University  of  Illinois  Press,  1957.  Norms  have  been  collected 
on  20  scales  for  360  words.  They  are  reported  in  Jenkins,  J.  J., 
Russell,  W.  A. , & Suci,  J.  An  atlas  of  semantic  profiles  for  360 
words.  American  Journal  of  Psychology,  1958,  21 » 688-699. 

2.  Advantages  of  Semantic  Differential  Items 

a.  Evidence  on  the  validity,  reliability,  and  sensitivity  of  the 
scales  has  been  offered. 

b.  Using  some  adjectives  that  do  not  seem  appropriate  to  the 
concept  under  investigation  may  uncover  aspects  that  reflect 
an  attitude  or  feeling  tone  even  though  the  respondent  cannot 
put  it  into  words. 

c.  Semantic  differential  items  can  be  used  to  study  the  relative 
similarity  of  different  concepts  to  the  respondent,  and  to 
study  changes  over  time. 

d.  Semantic  differential  items  are  relatively  easy  to  construct, 
administer,  and  score. 


IV-H  Page  2 
1 Jul  76 


Figure  IV-H-1 

Examples  of  Semantic  Differential  Items 

1.  Place  an  X in  each  of  the  following  rows  to  describe  your 
feelings  about  the  M16  rifle. 


Reliable 

Pieavy 

Good 

Slow 

Adequate 


Unreliable 

Light 

Bad 

Fast 

Inadequate 


2,  Place  an  X in  each  of  the  following  rows  to  describe  /our 
feelings  about  the  ABC  helmet. 


Reliable 
Heavy 
Good 
Slow 
Ad  iquate 


Unreliable 

Light 

Bad 

Fast 


Inadequate 


3 .  Disadvantages  of  Semantic  Piffeientlal  Items 

a.  If  care  is  not  taken,  the  two  adjectives  chosen  for  the 
extremes  will  not  define  some  kind  of  scale  or  dimension 
between  them. 

b.  The  value  of  semantic  diffe.ential  items  depends  on  the 
suitable  choice  of  :he  bipolar  adjectives  and  concepts. 

c Thera  is  a potential  response  error  present  in  thi 

respondents'  interpretations  of  the  meaning  of  toe  polar 
descriptions.  However,  tnara  appears  to  be  a balancing 
out  over  a number  of  administrations. 

d.  The  semantic  differential  is  complex  to  score  and  analyze 
U3ing  the  traditional  procedures. 


Am 


IV-R  Pape  3 
1 Jul  76 


Recommendations  Regarding  Use 

There  are  a number  of  investigators  that  advocate  the  use  of 
the  semantic  differential.  Othrrs,  however,  have  questioned 
whether  it  may  be  a rather  complicated  way  of  developing  a 
measure  that  is  more  readily  and  reliably  secured  by  other 
means.  It  is  reasonable  to  assume  that  the  technique  could 
aasily  be  expanded  to  identify  attitudes  and  the  intensity 
of  the  attitudes  toward  the  attractiveness  of  a particular 
military  specialty,  the  capacities  of  a specific  piece  of 
equipment  to  perform,  or  any  other  characteristic  set  which 
can  be  described  by  bioolar  adjectives.  Hcwever,  since  the 
analysis  of  sets  of  semantic  differential  items  is  somewhat 
involved,  the  technique  has  not  been  widely  used  for  routine 
Army  field  test  evaluations. 


IV- I Page  1 
1 Jul  76 

I.  Other  Types  of  Items 
1.  Check  Lists 

Check  lists  are  instruments  in  which  responses  are  made  by 
checking  the  appropriate  statement  or  statements  in  a list 
of  statements.  Examples  are  shown  in  Figure  IV-I-1. 

r~- —————— —————— ————————— 

Figure  IV-I-1 
Examples  of  Check  Lists 

1.  Which  of  the  following  are  important  to  consider  when  deciding 
whether  or  not  to  make  a career  of  the  Army?  Check  all  that 
apply. 

Leadership  of  NCO's 

Opportunity  for  promotion 

Playboy  magazines  in  the  Post  Exchange 

Latrine  in  crafts  shops 

Army  pay 

Choice  of  ducy  stations 

Civilian  opinion  of  Army 

Reenlistment  bonuses 

Hours  of  work  in  a work  week 

2.  Please  check  all  the  characteristics  which  Backpack  A pocsess. 
Durability 

Lightness 

Wearing  comfort 

Assessibility  of  items 

Ease  of  putting  on  and  taking  off 

Other  (specify:)  


IV-I  Page  2 
1 Jul  76 

Compared  to  rating  scales,  which  give  a numerical  value  to 
some  sort  of  judgment,  check  lists  are  relatively  crude. 

They  are,  however,  quite  useful  when  rating  information  is 
not  needed  or  when  information  is  needed  regarding  which  of 
a number  of  attitudes  are  significant  to  a respondent. 

Other  issues  regarding  the  use  of  check  lists  are  as  follows: 

a.  Check  lists  should  use  terms  like  the  respondent  uses. 

b.  Response  set  can  be  somewhat  controlled  If  the  respondeat 
is  asked  to  check  a stated  number  of  items,  or  if  upper 
or  lower  limits  are  set. 

c.  There  is  some  evidence  that  a higher  rate  of  claim  or 
assertion  is  obtained  from  choc1*  lists  than  from  open- 
ended  items. 

d.  It  is  usually  not  known  if  check  lists  cover  the  appro- 
priate attributes. 

e.  Adjective  check  lists  are  sometimes  used,  especially  to 
elicit  stereotypes  about  people  or  nations.  They  are 
similar  to  ratn.g  scales. 

2 . Matching  Items 

With  matching  items,  the  respondent  is  given  two  columns  of 
items  and  is  asked  to  pair  each  i^em  iu  the  first  column  with 
an  associated  item  in  the  second.  In  general,  it  is  not 
desirable  to  have  the  same  nianber  cf  items  in  each  column. 

Both  sets  of  itf  s should  constitute  a homogeneous  set,  and 
any  item  in  the  second  column  should  lock  like  it  could  go 
with  any  item  in  the  first  column. 

Matching  items  are  best  used  in  achievement  testing. 

Since  they  have  little  utility  in  Army  field  test  evaiuacior • , 
they  are  not  discussed  in  greater  detail. 

3 . Arrangement  Items 

With  an  arrangement  item,  a number  of  statements  are  presented 
in  random  order,  and  the  respondent  arranges  them  in  a given 
way.  For  example,  steps  in  a sequenc.  if  events  or  procedure.?, 
may  be  rearranged  in  order  of  occurence  or  performance.  Or, 
causes  may  be  rearranged  in  or^er  of  importance  in  bringing 
about  a certain  effect. 


IV-I  Page  3 
1 Jul  76 


i 

t 

i 

\ 

* 


I 


There  may  be  some  situations  where  arrangement  items  may 
be  useful  in  Army  field  test  evrluations;  however,  the 
scoring  of  we  items  is  difficult.  The  use  of  such  items 
is,  therefore,  extremely  limited. 

4.  Formats  Providing  for  Supplementary  Responses 

The  questionnaire  writer  is  not  limited  to  the  major  item  formats 
described  in  this  chapter.  Formats  providing  for  supplementary 
responses  can  also  be  used.  Examples  are  shown  in  Figure  IV-1-2. 


i 


i 


i 

j 

i 


i 


I 

i 

i 

i 

i 

i 

i 

i 


Figure  J.V-I-2 

Examples  of  Formats  Providing  for  Supplementary  Responses 
The  starlight  scope  is  able  to  detect  aggressor  movements: 

very  effectively. 

effectively. 

It  jderline , 

Ineffectively. 

very  ineffectively. 

Explain:  


2.  What  style  of  leadership  was  used  by  the  most  effective  squad 
leader  you  served  unde. ? (Check  ore) 

democratic  and  friendly 

friendly  with  most;  authoritarian  with  the  others 


sometimes  autnoritarian;  sometimes  acts  l*ke  one  of  the 
men 


j 


i 


j 


i 


usually  authoritarian;  avoided  making  close  . r lends 
other  (please  describe)  


i 


IV-I  Page  4 
1 Jul  76 


Notice  that  the  extra  response  alternative  in  Example  2 
allows  the  respondent  in  effect  to  make  an  open-ended  item 
out  of  a multiple  choice  item.  Few  test  respondents,  how- 
ever, elect  to  do  this.  Inclusion  of  the  supplementary  or 
write-in  option  commits  you  to  extra  date  reduction  and 
analysis  effort  that  would  have  been  unnecessary  had  you 
anticipated  and  included  all  reasonable  response 
alternatives . 


Menu 


MIMiMuaMfliil 


V-A  Page  1 

1 Jul  76 


Chapter  V:  Attitude  Scales  and  Scaling  Techniques 

A.  Overview 


At  times  the  questionnaire  developer  will  wish  to  treat  the  total 
group  of  items  on  a questionnaire  as  a single  measuring  scale,  and 
from  them  obtain  a single  overall  score  on  whatever  he  is  interested 
in  measuring.  This  is  a common  practice,  especially  with  the 
measurement  of  attitudes.  A typical  attitude  scale  is  composed  of 
a number  of  auestions/statements  selected  and  put  together  from  a 
much  larger  number  of  questions/statemcxits  according  to  certain 
statistical  procedures.  Some  of  these  procedures,  called  scaling 
techniques,  are  discussed  in  this  chapter. 

A distinction  is  needed,  however,  between  two  ways  in  which  the 
term,  scale  is  used  in  this  manual.  An  attitude  scale  could  be 
constituted  of  items  each  one  of  which  employs  a response  scale. 
Aspects  of  response  scales  are  discussed  in  Chapter  VII  on  "Response 
Anchoring."  A component  of  s 're  ould  be  achieved  on  each  item. 
Adding  these  item  scores  togeta.  - which  means  considering  the 
whole  set  of  items  as  a scale  - produces  a total  attitude  score  for 
the  individual  respondent. 

There  are.  generally  speaking,  two  general  methods  for  the 
construction  of  scales  such  as  attitude  scales.  The  first  method 
makes  use  of  a judging  group  and  one  of  the  psychological  scaling 
methods  developed  by  Thurstone,  as  discussed  in  Section  V-B.  It 
results  in  a set  of  statements  being  assigned  scale  values  on  a 
psychological  continuum.  The  continuum  may  be  favorableness, 
unfavorableness,  like-dislike,  or  any  other  judgment.  The  psycho- 
logical scaling  methods,  therefore,  have  considerably  greater  appli- 
cation than  for  the  scaling  of  attitudes.  They  can  be  used  to  scale 
statements  or  objects.  They  have  been  used,  for  example,  to  deter- 
mine the  perceived  favorableness  of  words  and  phrases  commonly  used 
as  rating  scale  response  alternatives,  as  discussed  in  Chanter  VIII. 

The  second  general  method  is  based  on  the  direct  responses  of 
agreement  or  disagreement  with  attitude  statements  and  doer,  not 
result  in  a set  of  statements  being  assigned  scale  values  on  a 
psychological  continuum.  Both  the  Likert  and  Guttman  scales  dis- 
cussed in  Sections  V-C  and  V-D  are  examples  of  this  latter  method. 

For  information  (relating  to  altitude  scaling  and  scaling 
techniques)  beyond  that  contained  in  this  manual  the  following 
references  may  be  consulted. 

1.  Edwards,  A.  L.  Techniques  of  attitude  scale  construction. 

New  York:  Appleton-fenturv-Crof ts,  1957. 


V-A  Page  2 
1 Jul  76 

?.  Guilford,  J.  P.  Psychometric  methods  (2nd  ed.).  New  York:  McGraw- 

Hill,  1954. 

3.  Gulliksen,  H. , & Messick,  S.  (Eds.).  Psychological  scaling:  Theory 

and  applications.  New  York:  John  Wiley,  1969. 

4.  Lemon,  N.  Attitudes  and  their  measurement.  New  York:  John  Wilev, 

1974. 

5.  Nunnally,  J.  C.  Psychometric  Theory.  New  York:  McGraw-Hill,  1967. 

6.  Thurstone,  L.  L.  The  measurement  of  values.  Chicago:  University 

of  Chicago  Press,  1959. 

7.  lurgerson,  W.  S.  Theory  and  methods  of  scaling.  New  York:  John 

Wiley,  1958.  


i 


B.  Thurstone  Scales 


V-B  Page  1 
1 Jul  76 


This  section  discusses  three  scaling  methods  developed  by  L.  L.  Thurstone. 

For  additional  detail,  see  the  texts  referred  to  in  Section  V-A. 

1.  Method  of  Equal  Appearing  Intervals 

Thurstone's  method  of  equal  appearing  intervals  was  the  first 
major  method  of  attitude  scaling  to  be  developed.  It  was 
assumed  that  a group  of  statements  of  opinion  about  a partic- 
ular issue  could  be  ordered  on  a continuum  of  favor3bieness , 
unfavorableness,  and  that  the  ordering  could  be  such  that 
there  appears  to  be  an  equal  distance  between  the  adjacent 
statements  on  the  continuum. 

The  following  steps  are  followed  in  the  method  of  equal 
appearing  intervals: 

a.  From  the  literature  or  pilot  interviews,  a large  number 

of  statements  (100  to  200)  are  compiled  about  the  attribute 
or  object  of  an  attitude  under  study.  Irrelevant, 
ambiguous,  or  poorly  worded  statements  would  not  be 
selected. 

b.  A number  of  judges,  at  least  50,  are  obtained.  They 
should  be  similar  to  those  individuals  who  will  respond 
to  the  final  statements  on  the  questinnaire.  The  judges 
independently  sort  each  statement  into  one  of  11  piles. 

The  first  pile  is  defined  as  "Unfavorable"  or  "Most 
unfavorable,"  the  middle  or  sixth  pile  is  defined  as 
"Neutral,"  and  the  eleventh  pile  is  defined  as  "Favor- 
able" or  "Most  favorable."  The  other  files  are  left 
undefined.  The  judges  are  told  that  the  intervals 
between  piles  or  categories  arc  to  be*  regarded  as  sub- 
jectively equal.  They  are  also  instructed  to  ignore 
their  ov/n  agreement  or  disagreement  with  each  item,  and 
to  judge  each  item  in  terms  of  its  degree  of  favorable- 
ness-unfavorableness . 

c.  The  scale  value  for  each  item  is  usually  determined  by 
computing  its  mean  or  median,  over  all  judges. 

d.  Twentv  to  25  statements  with  little  dispersion  in  their 
scale  values  arc  then  selected  for  use.  T he  statements 
are  selected  so  that  the  intervals  between  .statements' 
scale  values  are  approximately  eoual  and/or  are  relatively 
equally  spaced  on  the  psychological  continuum. 


V-B  Page  2 
1 Jul  76 


e.  The  finally  selected  statements  are  usually  placed  in 
random  order  for  presentation  to  respondents.  The 
responder  is  asked  to  indicate  which  statements  he 
agrees  with,  and  which  he  disagrees  with. 

f.  The  respondent's  score  is  the  mean  or  median  scale  value 
of  those  statements  for  which  he  marked  "Agree." 

Some  considerations  for  use  of  the  Equal  Appearing 
Intervals  method  are: 

a.  The  method  of  equal  appearing  intervals  Is  designed  to 
provide  an  interval  scale  as  its  output.  The  scale  is 
at  least  ordinal  (ranked). 

b.  The  method  is  useful  when  there  are  a large  number  of 
statements  involved. 

c.  Scale  values  from  widely  differing  groups  of  judges  appear 
tc  correlate  highly  with  one  another  so  long  as  judges 
with  extreme  views  are  eliminated, 

d.  Graphic  or  numerical  rating  scales  can  be  used  by  the 
judges  instead  of  having  the  statements  sorted  into 
piles.  Though  11  categories  are  usually  used,  some 
other  number  can  be  employed. 

2.  The  Method  of  Paired  Comparisons 

Thurstone  developed  a procedure  for  deriving  an  interval 
scale  based  upon  w’-  .r  has  been  called  the  Law  of  Comparative 
Judgment.  Basically,  it  is  a method  by  which  statements  such 
as  "A  is  stronger  than  B,"  "B  is  stronger  than  C,"  etc.,  are 
used  to  provide  a scale  with  interval  properties.  The  objects 
or  statements  to  be  ranked  are  presented  two  at  a time,  and  the 
respondent  is  asked  to  choose  between  them.  All  possible 
combinations  of  paiis  have  to  be  presented.  Hence  the  pro- 
cedure becomes  very  cumbersome  when  there  are  nore  than  15  or 
so  items.  The  determination  of  scale  values  is  also  laborious. 
Since  the  procedure  is  not  used  much  in  applied  research, 
additional  detail  is  not  presented  here. 

3 . The  Method  of  Successive  Intervals 

The  method  of  successive  intervals  is  similar  to  the  method  of 
equal  appearing  intervals.  However,  no  assumption  fs  made  con- 
cerning the  psychological  equal Uv  of  the  category  intervals . 


V-B  Page  3 
1 Jul  76 

It  is  only  assumed  that  the  categories  are  in  correct  rank 
order  and  that  their  boundary  lines  are  relatively  stable. 
The  procedure  involves  estimating  the  widths  of  the 
categories  along  the  psychological  continuum,  and,  from 
these  reference  points,  the  scale  values  of  the  statements 
can  be  obtained.  Research  has  shown  that  there  is  a linear 
relationship  between  scales  constructed  by  the  method  of 
paired  comparisons  and  by  the  method  of  successive  intervals. 


C.  Likert  Scales 


V-C  Page  1 
1 Jul  76 


The  Likert  method  of  scale  construction  was  developed  because  the 
Thurstone  procedures  require  extensive  woric  and  make  assumptions 
regarding  the  independence  of  item  statements.  The  Likert  method 
assumes  that  all  statements  reflect  the  same  attitude  dimension 
and  are  hence  related  to  each  other.  The  Likert  approach  does  rot 
assume  equal  intervals  between  the  scale  values.  It  is  sometimes 
called  the  method  of  summated  ratings . 

The  steps  in  Likert  scale  construction  are  as  follows: 

1.  Statements  are  classified  in  advance  as  "Favorable"  or 
"Unfavorable."  No  attempt  is  made  to  find  an  equal  dis- 
tribution of  statements  over  the  whole  range  of  the  attitude 
of  concern,  and  no  attempt  is  made  to  scale  the  statements. 

2.  A protest  is  then  conducted.  In  the  pretest  the  respondents 

indicate  their  degree  of  agreement  with  every  statement, 
usually  using  five  response  alternatives:  strongly  agree, 

agree,  undecided,  disagree,  and  strongly  disagiee. 

3.  Each  descriptor  is  assigned  a numerical  weight  (e.g.,  +2, 

+1,  0,  -1,  -2)  usually  based  on  a given  series  of  integers 
in  arithmetical  sequence. 

4.  Each  respondent  is  assigned  a score  that  represents  the 
algebraic  summation  of  weights  associated  with  each  item 
checked.  In  the  scoring  process  weights  are  assigned  such 
that  the  direction  of  attitude,  favorable  to  unfavorable, 
is  consistent  over  items.  For  example,  if  ar  +2  is 
assigned  to  "Strongly  agree"  for  favorable  statements,  a 
-2  should  be  assigned  to  "Strongly  agree"  for  unfavorable 
statements . 

5.  The  statements  finally  selected  for  use  in  the  questionnaire 
are  those  which  appear  to  discriminate  best  between 
respondents  with  the  highest  and  lowest  total  scores. 

Usually  about  half  of  the  statements  are  favorable,  half 
unfavorable. 

6.  In  the  final  questionnaire,  a score  is  obtained  by  summing 
the  numerical  weights  assigned  to  the 


V-C  Page  2 
1 Jul  76 


Factois  to  be  taken  into  consideration  when  deciding 

whether  to  use  Likert  scales  include: 

1.  Likert  scales  take  less  time  to  construct  than  Thurstone 
scales . 

2.  It  is  possible  to  construct  scales  by  the  Likert  and 
Thurstone  methods  which  will  yield  comparable  scores. 

3.  Likert  scales  have  only  ordinal  properties.  If  there 
is  a large  dispersion  about  a respondent's  mean  score, 
however,  even  those  properties  have  limited  meaning. 

If  the  sole  purpose  of  a scaling  procedure  is  to  rank 
respondents  according  to  the  degree  to  which  they  hold 
some  attitude,  then  Likert  scales  are  efficient  because 
of  their  ease  of  administration. 

4.  In  addition  to  lacking  metric  properties,  Likert  summated 
scores  lack  a neutral  point.  The  interpretation  of  a 
score  cannot  be  made  independently  of  the  distribution 

of  scores  of  some  defined  group.  However,  percentile 
or  deviation-tvpe  norms  can  be  calculated  if  the  sample 
size  is  large  enough. 

5.  For  the  same  number  of  items,  scores  from  Likert  scales 
may  be  more  reliable  than  scores  from  Thurstone  scales. 


D.  Guttman  Scales 


V-D  Page  1 
1 Jul  76 


Guttman* s approach  to  scaling  is  called  scalogram  or  scale  analysis. 
It  is  a deterministic  model;  it  considers  its  scales  are  close  to 
being  rulers-measures  of  length.  The  essence  of  the  method  is  to 
determine  whether  a series  of  statements  can  be  appropriately 
scaled.  An  attempt  is  made  to  identify  a set  of  statements  which 
actually  reflect  a unidimensional  scale  and  have  a cumulative 
nature.  When  the  goal  is  achieved,  two  or  more  persons  receiving 
the  same  score  will  have  responded  in  the  same  way  to  all  of  the 
statements . 

As  an  example,  the  following  four  questions  comprise  a Guttman 
scales • 


Yes  No 

a.  The  United  Nations  is  mankind's  savior  

b.  The  United  Nations  is  our  best  hope  for  peace  

c.  The  United  Nations  is  a constructive  force  in  the 

world  

d.  We  should  continue  our  participation  in  the 
United  Nations 


The  expected  pattern  of  responses  to  these  questions  is  "triangular". 


Person 


Item  12  3 4 

a x 

b xx 


c xxx 

d x x x x 

This  means  that,  for  any  person  who  answers  ves  to  it?n  "a",  there 
is  a high  probability  that  he  will  answer  yes  to  the  other  items. 

A person  who  says  no  to  "a"  but  yes  to  "b"  has  a high  probability 
of  answering  yes  to  the  other  Items,  and  so  on. 


V-D  Page  2 
1 Jul  76 


The  major  steps  in  scalograra  analysis  are  too  complex  to  sum- 
marize here,  but  are  found  in  some  of  ;_he  references  in  Section  V-A. 
Procedures  are  available  for: 

1.  Measuring  the  amount  of  error  due  to  imperfect  scalability. 

2.  Ordering  the  statements  so  that  the  response  patterns  provide 
the  least  amount  of  error. 

3.  Determining  the  extent  to  which  the  data  approximate  the  perfect 
case. 

4.  Improving  the  scalability  of  the  statements  via  category 
combinations,  statement  discarding,  etc. 

There  have  been  many  critics  of  scalogram  analysis.  Some  feel 
that  there  is  no  really  effective  way  of  selecting  good  items  by  this 
approach.  However,  the  procedure  is  considered  useful  if  one  is 
concerned  with  unidimensionality  or  if  one  wishes  to  examine  small 
changes  in  attitudes.  It  is,  however,  laborious.  No  instances  of 
past  use  in  field  testing  situations  are  known. 


V-E  Page  1 
1 Jul  76 


E , Other  Scaling  Techniques 

Numerous  other  scaling  techniques  and  combinations  of  methods 
are  reported  in  the  literature.  A discussion  of  them  is,  however, 
outside  the  current  scope  of  this  manual. 


VI -A  Page  1 
1 Jul  76 


Chagter  VI:  Preparation  of.  Questionnaire  Items 


Overview 

Once  a decision  has  been  made  the  gj.  « 

tha‘  r%“e„%“  the  ac^f deieio^snt  of  the  items.  This  chapter. 

difficulty  of  items;  length  of  question  at em;  ornarrf  question 

anchoring  is  considered 

in  Chanter  VII. 

r."  ~ Swa.Tias 

the  question.  (They  are  sometimes  called  ^pti°ns.  ) ® ^ 

stem  is  that  part  of  the  item  that  comes  before  th,  respon 

alternatives. 


Vi-B  Page  1 
1 Jul  76 


B.  Mode  of  Items 

Questionnaire  items  are  usually  presented  to  a respondent  in 
printed  form.  However,  it  is  possible  to  present  items  or  stimuli 
pictorially.  There  is  some  evidence  that  there  are  no  significant 
differences  in  subjects'  responses  to  verbal  and  pictorirl  formats. 
Using  a pictorial  format  may  facilitate  obtaining  responses  from 
respondents  with  limited  verbal  comprehension,  who  might  have  dif- 
ficulty responding  to  questions  employing  lengthy  definitions  of 
concepts  or  objects.  If  pictures  are  used,  they  should  be  pre- 
tested for  clarity  of  their  presentation  of  the  concept  or  object 
to  be  evaluated. 

In  cases  where  it  is  known  that  the  respondents  have  very  low 
reading  ability,  it  may  be  desirable  to  present  the  questionnaire 
orally.  A tape  player-recorder  may  be  used  for  this  purpose  also. 


cv 

< J 


VI-C  Page  1 
1 Jul  76 


Wording  of  Items 


The  wording  of  questionnaire  items  is  a critical  consideration  in 
obtaining  valid,  relevant,  and  reliable  responses.  Consider,  for 
example,  the  following  three  questions  that  were  administered  by 
Payne  (see  reference  below)  to  three  matched  groups  of  respondents: 


a.  "Do  you  think  anything  should  be  done  to  make  it  easier  for 
people  to  pay  doctor  or  hospital  bills?" 


b.  "Do  you  think  anything  could  be  done  to  make  it  easier  for 
people  to  pay  doctor  or  hospital  bi^ls?" 


c.  "Do  you  think  anything  might  be  done  to  make  it  easier  for 
people  to  pay  doctor  or  hospital  bills?" 


These  questions  differed  only  in  the  use  of  the  words  "should," 
"could,"  or  "might,"  terms  that  are  often  used  as  synonyms  even 
though  they  have  different  connotations.  The  percent  of  "Yes" 
replies  to  the  questions  were  82,  77,  and  63,  respectively.  The 
difference  of  19%  between  the  extremes  is  probably  enough  to  alter 
the  conclusions  of  most  studies. 


A number  of  matters  related  to  the  wording  of  questionnaire  items 
are  considered  in  this  section.  Some  of  the  suggestions  made  are 
based  upon  experimental  research.  Others  are  based  upon  experience, 
intuition,  and  common  sense  Several  sources  offering  principles 
of  question  wording  are: 

d.  Ro8low,  S. , & Blankenship,  A.  B.  Phrasing  the  question  in 
consumer  research.  Journal  of  Applied  Psychology,  1939,  23, 
612-622.  ~~J~  ’ 


b.  Jenkins,  J.  G.  Characteristics  of  the  question  as  determi- 
nants of  dependability.  Journal  of  Consulting  Psychology, 
1941,  5,  164-169. 

c.  Blankenship,  A.  B.  Psychological  difficulties  in  measuring 
consumer  preferences.  Journal  of  Market. ng,  1942,  6,  66-75, 

d.  Payne,  S.  L.  The  art  of  asking  questions  (Rev.  ed.). 

Princeton,  N.  J. : Princeton  University  Press,  1963. 


I 


rjessr 


1. 


Formulation  of  the  Question  or  Question  Stem 


VI-C  Page  2 
1 Jul  76 


a.  General  comments  regarding  items  and  question  stems. 
Issues  that  should  be  noted  concerning  the  general 
structure  of  questions  and  question  stems  are: 

(1)  Question  stems  may  be  in  the  form  of  an  incomplete 
statement,  where  the  statement  is  completed  by  one 
of  the  response  alternatives,  or  in  the  form  of  a 
complete  question.  See  Figure  VI-C-1  for  examples. 


Figure  VI-C-1  ; 

i 

Example  of  Question  Form  and  | 

Incomplete  Statement  Form  of  Stem  i 

I 

1.  How  qualified  or  unqualified  for  their  jobs  are  most  Army 

NCO's?  (Check  one.)  | 

Very  well  qualified 

I 

Qualified  t 

Borderline 

Unqualified 

Very  unqualified 

2.  Check  one  of  the  following.  Most  Army  NCO's  are: 

Very  well  qualified  for  their  jobs. 

Qualified  for  their  jobs. 

Borderline. 

“ * I 

i 

1 

Unqualified  for  their  jobs.  1 

i 

Very  unqualified  for  their  jobs.  I 


The  choice  between  these  two  methods  should  depend  on 
which  of  the  two  permits  simpler  and  more  direct  word- 
ing for  the  item  in  question.  Not  all  of  the  items  in 
a questionnaire  need  to  be  in  the  same  form. 


VI-C  Page  3 
1 Jul  76 


(2)  All  questionnaire  items  should  be  gramatically  correct. 

(3)  All  stems  should  be  as  neutrally  expressed  as  possible, 
and  the  respondent  should  be  permitted  to  indicate/ 
select  the  direction  of  his  preference.  If  this  is  not 
done,  the  stems  may  influence  the  response  distribution. 
If  items  cannot  be  expressed  neutrally,  then  alternate 
forms  of  the  questionnaire  should  be  used. 

(4)  A respondent  may  not  answer  an  item  if  he  is  not  able 
to  give  the  information  requested.  Therefore,  care 
should  be  exercised  in  the  wording  of  the  question, 

so  that  it  does  not  call  for  informt tion  not  possessed 
by  the  respondents. 

b.  .Accuracy  and  completeness  of  question  stems. 


(1)  The  stem  of  an  item  should  be  accurate,  even  though 
inaccuracies  may  not  influence  the  selection  of  the 
response  alternative. 

(2)  The  question  stem,  in  conjuction  with  each  response 
alternative,  should  present  the  question  as  fully  as 
necessary  to  allow  the  respondent  to  answer.  It 
should  not  be  necessary  for  the  respondent  to  infer 
essential  points.  An  example  of  an  insufficiently 
informative  question  stem  is  given  as  item  1 in 
Figure  VI-C-2.  It  is  insufficient  in  that  no 
specification  is  g<ven  as  to  who  should  carry  the 
scopes.  (The  response  alternatives  are  also  insuf- 
ficient since  the  respondent  is  not  allowed  to  say 
"None.")  Two  or  three  questions  might  be  needed  to 
obtain  all  the  information  desired.  Item  2 in 
Figure  VI-C-2  is  one  revision  that  makes  the  question 
stem  sufficient. 

(3)  Generally,  materials  which  are  common  to  all  response 
alternatives  should  be  contained  in  the  stem,  if  this 
can  be  done  without  the  need  for  awkward  wording. 

(4)  In  forming  questions  which  depend  on  respondents' 
memory  or  recall  capabilities,  the  time  period  a 
question  covers  must  e carefully  defined.  The 
"when"  should  be  specifically  provided. 


t 


VI-C  Page  4 
1 Jul  76 


Figure  Vl-C-2 

An  Insufficiently  Detailed  Question  Stem,  Plus  Revision 
1.  How  many  starlight  scopes  should  be  issued  to  a rifle  squad? 


1 

2 

3 

4 

5 


2.  Place  a check  in  front  of  each  squad  member's  "name"  below 
that  you  believe  should  be  issued  a starlight  scope: 


Squad  Leader 
Fire  Team  1 Leader 
Automatic  Rifleman 
Grenadier 
Rifleman 


Fire  Team  2 Leader 
Automatic  Rifleman 
Grenadier 
Rifleman 
Rifleman 


(5)  Question  stems  and  response  alternatives  should  be 

worded  so  that  it  is  clear  what  the  respondent  meant. 
Consider  the  question  "Should  this  cap  be  adopted,  or 
its  alternate?"  If  the  respondent  answers  "Yes,"  it 
would  still  be  unclear  which  cap  ("this  cap"  or  its 
alternate)  should  be  adopted. 

c.  Positive  versus  negative  wording. 

(1)  Alternative  wording  can  produce  demonstrable  effects 
on  survey  results. 

(2)  There  may  be  a tendency  for  the  direction  of  the 
question  stem  to  be  chosen  in  the  response  alternative. 

(3)  Studies  have  indicated  that  it  is  usually  undesirable 
to  include  negatives  in  question  stems  (unless  an 
alternate  form  with  positives  is  also  used  for  half  of 
the  respondents). 


VI-C  Page  5 
1 Jul  76 

(4)  Questions  worded  in  positive  terms  are  preferable  to 
questions  in  negative  terms  (if  alternate  forms  are 
not  being  used).  Questions  worded  negatively  may  be 
confusing,  or  negative  words  may  be  overlooked. 

(5)  If  it  seems  necessary  to  have  a particular  question 

in  negative  form,  the  negative  word  (e.g.,  not,  never) 
should  be  underlined  or  italicized.  Care  should  also 
be  taken  that  there  are  no  double  negatives,  as  they 
are  frequently  misinterpreted. 

(6)  A question  worded  in  negative  terms  can  often  be 
improved  by  rephrasing  it  in  positive  terms. 

d.  Definite  versus  indefinite  article  wording.  The  indefinite 
articles,  "a"  or  "an,"  would  be  used  in  a question  such  as 
"Did  you  see  demonstration  of  the  new  night  vision  device?" 

A comparable  question  using  the  definite  article  "the"  would 
be,  "Did  you  see  the  demonstration  of  the  new  night  vision 
device?"  There  is  some  evidence  that  changing  from  "a"  to 
"the"  reduces  the  level  of  suggestibility  of  an  item.  However, 
there  is  aot  enough  evidence  to  warrant  a firm  conclusion. 

e.  First,  second,  and  third  person  wording.  An  example  of  a 
statement  written  in  the  first  person  is,  "Army  NCO's  are 
understanding  of  my  needs  and  problems."  A statement  in 
the  second  person  is,  "Army  NCO's  are  understanding  of 
your  needs  and  problems,"  while  one  in  the  third  person  is, 
"Army  NCO's  are  understanding  of  the  needs  and  problems  of 
their  men."  It  Is  preferable  that  the  framework  of  ques- 
tions be  consistent  for  all  questions  in  a questionnaire, 
so  that  responses  are  comparable.  A respondent's  opinion 
of  the  effects  of  events  affecting  his  own  person  is  often 
quite  different  than  his  opinions  of  the  effects  o)  the 
same  events  on  others.  Hence,  questions  written  in  the 
first  or  second  person  may  elicit  entirely  different 
responses  than  the  "same"  question  written  in  the  third 
person. 


There  are  occasions  where  each  person  (first,  second, 
or  third)  is  appropriate.  For  example,  the  third  person 
should  probably  be  used  when  it  is  desired  to  elicit 
information  that  might  be  considered  too  personal  for  a 
person  to  answer  about  himself.  The  third  person  may  also 
be  used  in  attempts  to  elicit  information  about  the  feel- 
ings inherent  in  a minority  of  respondents,  but  about 
which  many  more  responde  * may  he  aware,  such  as  in  the 


VT-C  Page  6 
1 Jul  76 


statement,  "The  Army  is  ahead  of  most  areas  of  civilian 
life  in  reducing  racial  discrimination."  In  other  cases 
the  first  or  second  person  form  is  not  applicable,  such 
as  in  "The  Army  is  essential  for  the  defense  of  the 
country."  Also,  the  use  of  the  third  person  permits  a 
far  larger  number  of  personnel  to  answer  the  questions, 
since  some  first  person  questions  that  are  inapplicable 
to  many  individuals  become  applicable  when  in  the  third 
person.  Instances  may  occur  where  a respondent  is  asked 
a question  twice,  once  to  discover  how  he  personally 
feels  about  the  issue  (using  first  or  second  person), 
and  then  to  discover  what  he  judges  others'  feelings 
on  that  issue  are  (using  the  third  person).  Generally, 
however,  the  use  of  the  third  person  appears  preferable. 

f.  Loaded  and  leading  questions.  loaded  and  leading  ques- 
tions should  be  avoided.  Although  the  questionnaire 
writer  may  not  deliberately  attempt  to  distort  the 
distribution  of  responses,  he  may  sometimes  do  so 
unintentionallv . 

In  Figure  VI-C-3,  item  1 should  be  revised  to  maintain 
neutrality  by  removing  the  adjectives  applied  to  the  rifles 
It  is  true  that  the  M-16  weighs  less  and  fires  more  rounds 
faster,  but  there  are  other  characteristics  (accuracv, 
lethality  given  a hit,  etc.)  that  are  not  cited.  Hence, 
the  cuestion  is  loaded  because  it  only  presents  some  of 
the  data  relevant  to  comparing  the  rifles. 

Items  2 and  3 in  Figure  Vl-C-3  show  loading  of  a 
different  type.  In  item  2,  analvsis  of  the  available 
alternatives  leaves  the  impression  that  the  writer  of 
the  question  thinks  at  least  some  should  not  have  a full 
automatic  selector.  Analysis  of  the  alternatives  in 
item  3 leads  to  the  suspicion  that  the  writer  of  the 
question  believes  there  should  be  at  least  one  grenade 
launcher  in  the  rifle  squad,  since  a response  alternative 
of  zero  grenade  launchers  was  nut  provided. 

There  are  many  additional  ways  that  questions  can  be 
loaded.  One  vm"  is  to  provide  the  respondent  with  a 
reason  for  selecting  one  of  the  alternatives,  as  with  the 
question,  "Should  we  increase  taxes  in  order  to  get  better 
schools,  or  should  we  seep  then  about  the  same?"  A nues- 
tior*  can  also  be  loaded  bv  referring  to  some  prest  i gb  HIS 
individual  or  group,  as  in,  "A  group  of  experts  has  sug- 
gested...Do  vou  amrove  of  this,  or  do  vou  disapprove?" 


VI-C  Page  7 
1 Jul  76 


Figure  VI-C-3 

Examples  of  Loaded  Questions 

1.  Which  rifle  do  you  prefer,  the  lighter,  faster  shooting  M16 
or  the  heavier,  slower  firing  M14? 

M16 

Ml  4 

2.  Should  every  rifleman  in  the  rifle  squad  have  a full  automatic 
selector  on  his  rifle? 

Yes 

No 

If  no,  how  many  should? 

3.  How  many  grenade  launchers  (M791  do  you  desire  in  the  rifle 
squad? 

1 

2 

3 

4 or  more 


Leading  questions  are  similar  to  loaded  questions. 

Two  examples  are  shown  in  Figure  VI-C-4.  The  problem  is 
that  most  people  are  reasonably  cooperative  and  like  to  help. 
If  they  can  figure  out  what  is  wanted,  they  will  often  try 
to  comply.  The  items  in  Figure  VI-C-4  were  actually  used  in 
the  collection  of  data  in  a field  test.  Ap  might  be  expected, 
the  impression  received  trom  an  analysis  of  the  results  is  tha 
men  are,  in  general,  highly  motivated,  and  use  good  noise 
discipline  during  movement.  (These  items  also  alio-’  respon- 
dents to  avoid  critisring,  and  to  give  socially  desirable 
answers. ) 


r 


VI-C  Page  8 
1 Jul  76 


Figure  VI-C-4 

Examples  of  Leading  Questions 

1.  Do  you  think  your  men  were  pretty  highly  motivated  on  this 
exercise? 

j 

Yes 

No 

2.  Were  they  pretty  good  at  using  good  noise  discipline  during 
movement? 

Yes 

No 

f 

_ I 


The  best  way  to  avoid  loaded  questions  is  to  find  a 
devil's  advocate  to  review  them  or  to  pretest  the  items  on 
someone  who  holds  oppi  si  e or  mlnoritv  views.  Another 
check  is  to  ask  yourself  ’’’hat  you  think,  what  someone 
who  disagrees  with  ycu  would  think,  and  whether  your 
response  alternatives  would  give  him  a chance  to  present 
his  views. 

There  are  times  when  loaded  auestions  probably  should 
be  used.  This  is  when,  without  loading,  the  question 
would  pose  an  ego-threat  to  the  respondent,  so  that  he 
might  give  an  untruthful  replv.  The  loading  removes  the 
ego-threat  so  that  a more  valid  response  can  be  obtained. 

An  example  might  be,  "Man.’  people  are  not  able  to  get  as 
much  schooling  as  thev  would  like.  What  was  the  last 
grade  you  completed  in  school?" 

g.  Embarrassing  or  sc  1 f-incr iminat in;;  questions.  Respondents 
should  not  be  asked  embarrassing  or  sel f-incr iminat ing 
questions.  Consider  the  question,  "Did  you  clean  your 
weapon  regularly  in  Vietnam?"  It  is  asking  respondents 
who  did  not  ciean  their  rifles  regularly  to  expose 
themselves  to  possible  embarrassment.  Thus,  one  would 
expert  the  percentage  of  "No"  responses  to  ful1  short  ol 
the  true  percentage  not  cleaning  their  weapons  "regularly." 


VI-C  Page  9 
1 Jul  76 


h.  Questions  that  ask  respondents  to  go  against  basic 
inclinations . 

Many  people  are  reluctant  to  criticize,  though  they  enjoy 
giving  praise.  Thus,  a question  that  allows  a respondent 
to  avoid  criticism  will  bias  his  answers;  similarly,  a 
question  that  offers  him  the  opportunity  to  criticize  may 
bias  responses  because  he  will  not  wish  to  do  so. 

Figure  VI-C-5  illustrates  this. 


Figure  VI-C-5 

Example  of  a Question 
Asking  the  Respondent  to  Criticize 

1.  Was  your  unit's  use  of  fire  and  maneuver  correct,  and  in 
accordance  with  current  Army  doctrine? 

Yes 

No 


If  no,  why  not? 


The  question  in  Figure  VI-C-5  asks  the  respondent 
cither  to  criticize  his  unit  or  to  avoid  criticism.  Some 
respondents  might  answer  "Wo,"  if  they  have  an  important 
point  to  make.  However,  a substantial  number  of  others 
will  wash  their  hands  of  the  whole  affair  and  answer 
"Yes,"  although  they  might  feel  that  performance  was  not 
completely  correct. 

i . Inclusion  of  different  subjects  into  the  same  question . 

Double-barreled  (compound)  questions,  in  which  a respondent 
can  agree  with  one  part  of  a question  and  disagree  witli 
another,  should  be  avoided.  Consider,  for  example, 
item  1 in  Figure  VI-C-6.  Most  respondents  would  probably 
want  to  rate  completeness  and  accuracy  differently,  since 
in  most  situations  research  has  shown  that  they  are 
negatively  correlated.  Therefore,  ratings  of  the  two 
aspects  of  performance  should  be  rated  separately,  as 
shewn  in  items  2 and  3 of  Figure  VT-C-6. 


I 


VI-C  Page  10 
1 Jul  76 


1 


I 

F r ■ ■ — - ■ 1 " " ' “ 1 " 

I Figure  VI-C-6 

Examples  of  Double-Barreled  Questions  and  Alternatives 

[ ; 

1.  Fow  complete  and  accurate  was  the  surveillance  information? 
Very  satisfactory 

Satisfactory 

Borderline 

; Unsatisfactory 

Very  unsatisfactory 

2.  How  complete  or  incomplete  was  the  surveillance  information? 
Very  complete 

Fairly  complete 

Borderline 

Fairly  incomplete 

Very  incomplete 

3.  How  accurate  or  inaccurate  was  the  surveillance  information? 
Very  accurate 

Fairly  accurate 

Borderline 

Fairly  inaccurate 

j Very  inaccurate 


It  may  be  noted  that  in  item  2 of  Figure  VT-C-6  both 
"complete"  and  "incomplete"  are  included.  Similarly,  both 
"accurate"  and  "inaccurate"  are  in  the  stem  of  item  3.  To 
use  only  one  (e.g.,  "complete")  in  the  stem  would  tend  to 
Inflate  tb?  number  of  respondents  selecting  that  alternative. 


VI-C  Page  11 
1 Jul  76 


j.  Use  of  giveaway  words.  Avoid  words  which  lead  the  careful 
thinker  to  respond  in  the  negative  while  others,  thinking 
less  carefully,  respond  in  the  positive.  Consider  for 
example  the  question,  "Do  you  feel  that  your  unit  did  its 
best  in  all  contacts  over  the  past  six  months?"  One 
wonders  if  any  unit  can  do  its  actual  best,  except  very 
rarely.  The  word  "all"  makes  this  an  even  more  difficult 
question  to  answer  positively. 

k.  Ambiguous  questions.  Vague  or  ambiguous  words  or  questions 
should  be  avoided.  For  example,  the  question  "What  is  your 
income?"  is  noc  sufficiently  specific.  The  respondent  may 
give  monthly  or  annual  income,  income  before  or  after 
taxes,  uis  income  or  the  family  income,  etc. 

As  another  example,  consider  item  1 in  Figure  Vl-C-7. 


Figure  VI-C-7  I 

Example  of  Ambiguous  Question  and  Alternative 

1.  Did  you  clean  your  rifle  regularly  in  Vietnam? 

Yes  j 

ho  j 

t 

! 

2.  How  often,  on  the  average,  did  you  clean  your  rifle  in  Vietnam?  j 

I 

Every  dav  Once  everv  three  davs  ' 

i 

Once  every  two  days  Once  every  four  days  j 

j 

Other  (please  specify):  

I 

! 


Use  of  the  word,  "regularly"  without  specification  of  the 
time  interval  between  cleanings  is  a defect  in  the  question. 
A respondent  coold  justify  a "ves"  by  thi thing  to  himself: 
"Sure,  I cleaned  it  regularly  - once  everv  four  months."! 
Because  of  :ne  self-exposure  involved,  the  questionnaire 
item  approach  to  this  topic  is  probably  not  capable  of 
providing  an  accurate  estimate,  but  rewording  could  still 
make  the  amount  of  underestimation  less.  So,  if  the  data 
cannot  be  collected  by  field  inspection,  the  revised  ques- 
tionnaire item  could  read  like  item  2 in  Figure  VI-C-7. 


Formulation  of  the  Response  Alternatives 


When  formulating  the  response  alternatives  portion  of  a 

questionnaire  item,  the  following  points  should  be  kept 

in  mird: 

a.  All  response  alternatives  should  follow  the  stem  both 
gramatically  and  logically,  and  if  possible,  be  parallel 
in  structure. 

b.  If  it  is  not  known  whether  or  not  all  respondents  have 
the  background  or  experience  necessary  to  answer  an  item, 
(or  if  it  is  known  that  some  do  not),  a "Don't  know" 
response  alternative  should  be  included. 

c.  When  preference  questions  are  being  asked  (such  as 
(Which  do  you  prefer,  the  M16  or  the  M14  rifle?")  the 
"No  preference"  response  alternative  should  usually 
be  included.  The  identification  of  "No  preference" 
responses  permits  computation  of  whether  or  not  an 
actual  majority  of  the  total  sampled  are  pro  or  con. 

d.  The  use  of  the  "None  of  the  above"  option  or  variants 
of  it  such  as  "Not  enough  information"  is  sometimes 
useful. 

e.  The  option  "All  of  the  above"  may  on  rare  occasions  be 
useful.  It  seems  more  appropriate  to  academic  test 
questions  than  to  the  questioning  of  field  test 
participants. 

f.  For  most  items,  the  questionnaire  writer  desires  the 
respondent  to  check  only  one  response  alternative. 

Use  of  the  parenthetic  "(Check  one.)"  should  eliminate 
the  selection  of  more  than  one  alternative.  It  is  very 
important  to  make  it  clear  to  che  respondent  that  he 
may  check  more  than  one  alternative  in  those  fairly 
rare  instances  vhere  the  questionnaire  writer  does 
wish  to  permit  this. 

g.  In  some  instances,  response  categories  as  long  as  a 
sentence  may  be  mor»  desirable  than  short  descriptors. 

In  rare  cases,  numbers  may  be  used  without  verbal  de- 
scriptors, if  the  numbers  have  been  previously  defined. 

h.  Number  of  response  alternatives  is  discussed  In  Sec- 
tion VI-G,  order  of  response  alternatives  in  Section  Vl-H, 
response  anchoring  in  Chapter  VI L,  and  the  order  of 
perceived  favorableness  of  commonly  used  word:-,  and 
phrases  in  Chapter  VIII. 


VI-C  Page  13 
1 Jui  76 


3.  Expressing  Directionality  and  Intensity  in  Stem  Versus 
Response  Mternatlves 

In  Item  1 of  Figure  VI-C-8,  directionality  (in  this  case, 
satisfaction)  is  expressed  in  the  question  stem. 



I 

Figure  VI-C-8 

Alternate  Ways  of  Expressing  Directionality  and  Intensity 

1.  The  M16  is  a satisfactory  rifle. 

Agree 

Disagree 

2.  Th?.  M16  is 

a satisfactory  rifle. 

an  unsatisfactorv  rifle. 

- _ — - - * 

3.  The  behavior  of  civilian  employees  of  the  PX  toward  enlisted 
personnel  is  extremely  offensive. 

Agree 

Disagree 

4.  The  behavict  of  civilian  employees  of  the  PX  toward  enlisted 
personnel  xs 

very  offensive. 

somewhat  offensive. 

neutral.  ! 

somewhat  pleasant . 

very  pleasant. 


In  item  2 the  directionality  Is  expressed  in  the  response 
alternatives.  In  item  3 the  stem  contains  terms  of  intensity 
and  directionality,  while  these  terms  are  located  in  the 
response  alternat  ives  in  item  4.  Item  2 is  preferred  to  iter”  1, 
and  item  4 is  strongly  preferred  to  the  item  3 approach. 


VI-C  Page  14 
1 Jul  76 


The  rationale  for  this  preference  is  similar  to  the  discussion 
of  positive  versus  negative  terms.  Those  who  check  "Disagree" 
to  item  3 have  not  been  permitted  to  indicate  what  it  is  they 
would  agree  with,  (e.g.,  those  who  feel  employees  are  offensive 
but  not  extremely  offensive  would  have  to  check  "Disagree"  as 
would  those  who  feel  employees  are  very  pleasant),  whereas  the 
construction  of  item  4 ‘oes  permit  them  to  do  so.  It  would 
take  five  versions  of  item  3 to  correct  this  deficiency  and 
achieve  the  coverage  of  opinion  incorporated  by  tne  response 
alternatives  of  item 


D.  Difficulty  of  Items 


VI-D  Page 
1 Jul  76 


1.  One  of  the  major  recommendations  advanced  by  almost  every 
general  source  on  how  to  write  sound  questionnaires  is  "keep 
it  simple."  Logic  dictates  that  words  used  in  surveys  should 
not  have  multiple  meaning,  nor  should  they  be  beyond  the  level 
of  vocabulary  of  the  typical  respondent.  Words,  phrases,  and 
sentence  structures  that  the  respondent  can  understand  should 
be  used. 

Consider  item  1 in  Figure  VI-D-1.  It  contains  too  many 
hard  to  understand  words.  Many  respondents  would  have  dif- 
ficulty understanding  either  the  question  or  the  response 
alternatives.  In  the  revision  in  item  2,  the  words  have  been 
simplified,  and  a "catch-all"  open-ended  response  alternative 
added  (to  catch  all  other  reasons). 


Figure  VI-D-1 

Example  of  Hard  to  Understand  Item  and  Alternative 

1.  In  the  highly  specialized  counterinsurgency  environment 
represented  by  the  basically  internecine  affair  in  V?.etnam, 
what  would  you  say  should  represent  the  basic  essence  of  our 
rationale  for  continuation  of  our  involvement? 

Prolongation  of  attrition  of  enemy  forces,  in  order  to 

reduce  the  level  of  threat  to  South  Vietnam. 

Orderly  transfer  of  military  responsibility  to  the  host 

country,  in  order  to  produce  stabilized  competency  to 
deal  with  any  future  internal  disturbances. 

2.  What  is  our  main  reason  for  staying  in  Vietnam?  (Check  one) 

To  reduce  the  threat  to  South  Vietnam  by  continuing 

the  destruction  of  enemy  forces. 

To  assure  South  Vietnam's  survival  while  it  takes 

over  responsibility  for  its  own  protection. 

Other  (specify)  


J 


VI-D  Page  2 
1 Jul  76 


It  should  not  be  assumed  that  the  respondent  will  under- 
stand what  the  question  writer  is  talking  about.  Consider, 
for  example,  the  question  "Which  do  you  prefer,  dichotomous 
or  open  questions?  The  odds  are  that  a fairly  substantial 
number  of  people  would  not  be  able  to  define  these  two 
question  types.  However,  if  they  are  asked  this  question, 
they  will  be  happy  to  choose.  The  point  is  that  people  will 
not  volunteer  their  ignorance  of  something,  though  they  may 
admit  it  if  you  ask  them.  However,  this  caution  goes  beyond 
ignorance  of  an  issue.  Another  problem  is  that  the  specialist 
wording  the  question  may  simply  have  an  unusual  command  of  his 
own  language.  Scientific  jargon  has  been  criticized.  Perhaps 
overlooked  is  the  fact  that  there  are  other  kinds  of  jargon, 
too.  The  question  asker  has  a responsibility  to  make  himself 
understood.  One  way  ^f  screening  for  individuals  who  do  not 
have  a basis  for  providing  the  information  needed  is  to  include 
one  or  two  pure  information  questions,  planning  to  discard 
questionnaire  returns  from  respondents  who  cannot  answer  the 
information  questions  correctly.  However,  our  usual  policy 
should  be  to  throw  out  or  revise  items  that  are  not  under- 
standable, rather  than  to  throw  nut  the  responses  of  the 
people  who  can't  understand  the  item. 

2 . Ways  of  Measuring  Item  Difficulty 

Various  procedures  exist  for  determining  the  difficulty  or 
reading  c^mprehesion  level  of  printed  material.  Such  a 
discussion  is,  however,  beyond  the  scope  of  the  preliminary 
version  of  this  manual.  Sources  that  may  be  consulted  include: 

a.  Dale,  F..  , & Chall,  .!.  S.  A formu'a  for  predicting  readability. 
Educational  Research  Bulletin,  19'  ' , 27_,  11-20,  37-54 

b.  Flesch,  R.  A new  readability  yardstick.  .journal  of  Applied 
Psychology,  1948,  32.  221-233. 

c.  Fry,  E.  A readability  formula  that  saves  time.  Journal  of 

Reading,  1968,  U,  513-516.  

d.  Lorge,  I.  Predicting  readability.  Teachers  College  Record, 

1944,  45,  404-419.  ' ’ 

e.  Thorndike,  E.  L.,  & Lorge,  R.  The  teacher's  word  book  of 

30,000  words.  New  York:  Columbia  University  Press,  1944. 


i.lrfl* 


VI-E  Page  1 
1 Jul  76 


E.  Length  of  Question/Stem 

This  section  notes  some  considerations  about  the  length  of  question 
stems.  There  is  little  research  in  this  area  to  guide  the  question- 
naire writer.  See  Section  IX-C  regarding  questionnaire  length. 

1.  It  is  sometimes  desirable  to  break  the  question  stem  into  two 

or  more  sentences  when  the  sentence  structure  would  otherwise  be 
unnecessarily  complex.  For  instance,  one  sentence  can  state 
the  situation,  and  one  can  pose  the  question.  Lengthy  question 
stems  that  try  to  explain  a complicated  situation  to  the  re- 
spondent should  be  avoided.  If  the  respondent  is  not  aware  of 
the  facts  presented,  he  may  become  more  confused  or  biased  than 
enlightened,  and  his  opinion  would  not  mean  much. 

2.  Longer  open-ended  questions  do  not  necessarily  produce  a 
greater  amount  of  and  more  accurate  information  than  shorter 
ones.  However,  it  may  take  more  words  to  achieve  a proper  focus. 

3.  Questionnaire  developers  have  a tendency  to  use  long  question 
stems  with  true-false  questions  when  "True"  is  the  correct 
answer.  Respondents  often  detect  and  react  to  this  tendency. 
Field  test  questionnaires,  however,  should  make  relatively 
little  use  of  "True"  and  "False"  response  alternatives.  These 
alternatives  are  more  appropriately  used  when  testing  whether 
respondents  have  acquired  a required  proficiency  level,  for 
example,  the  ability  to  visually  recognize  a given  type  oi 
enemy  aircraft. 


VI-F  Page  1 
1 Jul  76 


F.  Order  of  Question  Stems 

There  are  two  issues  to  consider  regarding  the  order  of  question 
stems.  The  first  has  to  do  with  the  order  of  questions  within  a 
series  of  items  that  are  designed  to  explore  the  same  topic  or 
subject  matter  or  related  subject  matter  areas.  The  second  has 
to  do  with  the  order  of  different  groups  of  questions  when  the 
groups  deal  with  fairly  separate  topics  or  subject  matter  areas. 
For  example,  one  group  of  questions  may  deal  with  factual  items, 
while  another  may  deal  with  attitudes.  If  items  bearing  on  the 
same  point  are  presented  in  succession,  the  respondent  can  pro- 
ceed more  readily  through  them.  Thus  this  is  usually  a desir- 
able practice.  An  exception  arises  when  one  wishes  to  check  the 
consistency  of  the  respondent.  To  do  this,  two  (or  more)  similar 
items  are  included,  but  at  widelv  different  points  in  the 
questionnaire. 

1 . Order  of  Questions  Within  a Series  of  Items 

a.  It  is  often  recommended  that  the  order  of  questions  cn  a 
instrument  be  varied  or  assigned  randomly  to  avoid  one 
question  contaminating  another.  The  view  is  that  the 
immediately  preceding  question  or  group  of  questions  places 
the  respondent  in  a "mental  set"  or  frame  of  reference. 

For  example,  asking  respondents  a general  question  about 
their  feelings  regarding  automobile  exhaust  pollution  might 
influence  responses  to  the  question,  "Do  you  prefer  leaded 
or  nonleaded  gasoline?”  Although  this  effect  may  be 
prominent  in  specific  settings  or  with  specific  question- 
naires, there  is  little  evidence  in  the  literature  to 
support  its  general  existence. 

b.  Sometimes  it  is  recommended  that  broad  questions  be  asked 
before  specific  questions.  The  rationale  for  this  approach 
is  ihat  the  respondent  can  more  easily  and  validly  answer 
specific  questions  after  hav'ng  had  a chance  to  <'or.sider  the 
broader  context.  Also,  asking  the  specific  questions  first 
could  influence  the  response  to  the  broader  question.  Some- 
times, however,  it  is  best  to  start  with  the  iron  specific 
questions,  especially  when  the  respondent  r.houlc.  have 
experiences  or  issues  in  mind  when  he  answers  the  nort 
general  questions;  or  when  the  questionnaire  deals  with  a 
complex  issue  which  the  respondent  nay  not  nave  thought  too 
much  about . 

c.  The  order  of  questions  within  a series  of  items  will  also 
depend  upon  whether  filter  questions  are  needed.  A filter 
question  is  used  to  exclude  a respondent  from  a particular 


VI-F  Page  2 
1 Jul  76 


sequence  of  questions  if  those  questions  are  irrelevant  to 
him.  For  example,  if  a series  of  items  were  asked  about 
different  kinds  of  weapons,  a "No"  response  to  a question 
such  as  "Have  you  ever  used  the  M14  rifle?"  might  be  used 
to  indicate  that  the  respondent  should  skip  the  following 
question(s)  about  the  M14. 

2.  Order  of  Different  Groups  of  Questions 

a.  There  is  usually  a psychological  or  logical  order  in 
which  tc  ask  the  questiois,  so  that  the  queeti onnai . t 
flows  smoothly  from  one  topic  to  the  next  and  the  re- 
spondent is  not  shifted  frecuently  from  one  topic  to 
another  and  tack  again.  However,  a shift  from  one 
topic  to  another  should  be  apparent  to  the  respondent. 

b.  It  is  usually  recommended  that  more  difficult  or  more 
sensitive  questions  be  asked  later  in  the  questionnaire, 
possibly  at  the  end. 

c.  One  or  more  easy,  nonthreatening  questions  should 
probably  be  asked  first  to  build  rapport.  They  should 
be  short  and  easy  to  understand  and  to  answer.  But 
they  should  not  be  irrelevant  to  the  objectives  of  the 
questionnaire.  Verbal  efforts  to  build  rapport  by  the 
questionnaire  administrator  seem  preferable  to  using 
questionnaire  content. 

3 . Effects  of  Order  of  Questions  on  Subjects'  Responses 

There  is  no  evidence  that,  the  order  of  presentation  of 
questions  on  a questionnaire  has  any  effect  on  the  si-tject's 
choice  of  response  alternatives. 


VI -G  Page  1 
1 Jul  76 


G.  Number  of  Response  Alternatives 

One  of  the  basic  issues  in  the  use  of  rating  questions  or  attitude 
scales  is  the  determination  of  the  optimum  number  of  responses, 
alternatives  or  categories.  Researcher's  habit  or  tradition 
rather  than  solid  empirical  support  often  has  led  to  the  recurrent 
use  of  five-point  rating  scales,  seven-point  semantic  differential 
scales,  and  so  on.  The  reason  for  concern  with  tht  number  of 
response  alternatives  ;stems  from  the  belief  that  a "coarse”  scale 
with  too  few  response  alternatives  may  result  in  a loss  if  infor- 
mation concerning  the  respondents'  discrimination  powers.  It  may 
reduce  the  respondents'  cooperation  in  rating,  as  a coarse  scale 
"forces"  judgments  and  thereby  irritates  some  respondents.  An 
extremely  "fine"  scale,  with  too  many  response  alternatives,  may 
go  beyond  the  respondents'  powers  of  discrimination,  be  excessively 
time  consuming,  cr  difficult  to  score. 

The  following  sections  consider  number  of  response  alternatives 
to  use  in  multiple  choice,  rating  scale,  and  forced  choice  items: 
Section  VI-C-3  - formulation  of  response  alternatives;  Section  VI-H  - 
order  of  response  alternatives;  Chapter  VII  - response  anchoring; 
Chapter  VIII  - order  of  perceived  ^avorableness  of  words  and  phrases. 

1 . Number  of  Response  Alternatives  with  Multiple  Choice  Items 

No  firm  rules  can  be  established  regarding  the  number  of  response 
alternatives  to  use  with  multiple  choice  items.  It  depends  in  a 
large  part  upon  the  question  being  asked  and  the  number  of  answers 
logically  possible.  The  following  considerations,  however,  may  be 
noted : 

a.  There  is  some  evidence  the  dichotomous  items  (items  with  only 
two  response  alternatives-  are  star ist icailv  inferior  to  items 
with  more  than  two  response  alternatives. 

’o.  Dichotomous  items  are  easier  to  score  than  nond ichotomous 

items,  but  thc-v  mn>  not  be  accepted  as  well  by  the  respondent. 

c,  A good  nonuich otoraous  multiple  choice  item  usualllv  can  not 
be  written  as  a set  t separate  dichotomous  items. 

d.  Consideration  should  be  given  to  the  fact  that  many  response 
alternatives  may  mate  a questionnaire  unduly  time  consuming. 

c.  The  number  of  choices  logically  possible  or  desirable  should 
constitute  an  upper  liipit  on  the  number  of  resnonse  alter- 
natives u'-'d  for  an  item. 


VI-G  Page  2 
1 Jul  76 


i Non-existent  response  alternatives  may  be  checked  by  the 
respondent  if  an  answer  sheet  is  used  which  has  more 
spaces  than  there  are  alternative  answers,  e.g.,  the 
answer  sheet  has  five  spaces  for  each  question  but  some 
questions  have  fewer  than  five  alternatives. 

2 . Number  of  Response  Alternatives  with  Rating  Scale  Items 

Authorities  in  psychometrics  contend  that  the  optimal  number 
of  response  alternatives  to  employ  with  rating  scales  is  a 
matter  for  empirical  determination  in  any  situation.  They 
also  suggest  that  considerable  variation  in  number  around  the 
optimal  number  changes  reliability  very  little.  These  con- 
clusions seem  to  be  supported  by  the  available  research 
literature.  Although  rules  regarding  the  number  of  response 
alternatives  to  use  with  rating  scales  cannot,  therefore,  be 
firmly  established,  the  following  issues  can  be  considered. 

a.  The  effects  of  increasing  or  decreasing  the  number  of 
respon  • alternatives  for  a question  cannot  be  generally 
specified  with  certainty.  Increasing  the  number  of 
response  alternatives  does  not  necessarily  increase 
reliability,  and  there  is  no  consistent  relationship 
between  the  number  of  response  alternatives  and  validity. 

b.  J.  P.  Guilford  (in  Psychometric  methods.  New  York: 
McGraw-Hill,  1*154)  reported  that  seven  response  alter- 
natives is  usually  lower  than  optimal,  and  it  may  pay  in 
some  favorable  situations  to  use  up  to  25  scale  divisions 
Others  believe  that  seven  steps  or  five  *?  optimal.  Somp 
believe  that  five  should  be  used  for  single  or  unipolar 
(one  direction)  scales,  nine  tor  double  or  bino'ar  scales 
Many  practitioners  consistently  use  five-point  scales. 
Sometimes  a nine-point  hedonic  (pleasure)  scale  is 
recommended  for  food  items,  and  a six-point  scale  for 
other  uses. 

c.  The  number  of  response  alternatives  to  use  is  often 
determined  on  the  basis  of  the  degree  of  discrimination 
required.  For  example,  a nine-point  scale  may  somet'mes 
(but  not  always'1  give  greater  discrimination  than  a 
three-point  scale. 

d.  Psycho1 ogists  with  considerable  experiences  in  military 
operational  field  testing  feel  that  anything  more  than 
five  alternatives  is  too  great  a number  for  many  junior 
enlisted  personnel  to  discriminate  among.  More  non-re- 
sponses are  secured  and  the  reliability  of  discrimination 
of  answered  items  is  not  increased. 


VI-G  Page  3 
1 Jul  76 


e.  Questionnaire  administration  time  is  probably  a function 
of  the  number  of  response  alternatives. 

f.  Thare  is  some  evidence  that  increasing  the  number  of 
response  alternatives  seems  to  decrease  the  number  of 
nonresponses  and  uncertain  responses  (e.g.,  "Cannot  decide"). 

g.  In  addition  to  the  response  alternatives  representing  the 
rating  scale  continuum,  it  may  be  necessary  to  add  alter- 
natives such  as  "Have  no  effect"  or  "No  opinion." 

h.  Scoring  and  data  analysis  considerations  may  affect  the 
selection  of  the  number  of  response  alternatives.  If 
Chi  square  tests  are  sufficient,  two  or  three  response 
alternatives  might  be  adequate.  However,  if  two  or  three 
response  alternatives  are  used  when  nonparame trie  rank 
order  correlations  are  employed,  substantial  "cies"  on 
ranks  will  result.  If  parametric  statistics  are  to  be 
employed,  more  alternatives  are  usually  better,  because 
of  the  assumption  of  continuous  distributions  or 
interval  scale  properties. 

3.  Number  ol  Response  Alternatives  with  Forced  Choice  Items 

A number  of  different  forced  choice  item  formats  have  been 

used,  such  as  t;ie  following; 

a.  Two  phrasvs  or  statements  per  item,  both  favorable  or 
both  unfavorable,  choose  the  more  descripti\e  or  the 
least  descriptive. 

b.  Three  statements  per  item,  all  favorable  or  unfavorable, 
choose  the  most  and  least  descriptive  statements  in 
each  item. 

c.  Four  statements  per  item,  all  favorable,  choose  the  two 
most  descrij  Ive  statements. 

d.  Four  statements  per  item,  all  favorable,  choose  the  most 
and  least  descriptive  statements. 

e.  Four  statements  ner  item,  two  favorable  and  two  unfavor- 
able, choose  the  most  and  least  descriptive  statements. 

f.  live  statements  per  item,  two  of  which  were  favorable, 
one  neutral,  and  two  infavov.sb! » in  appearance,  choose 
the  most  and  least  descriptive. 


VI-G  Page  4 
1 Jul  76 

The  evidence  is  not  clear,  but  three  or  four  statements 
per  item  may  be  preferable  to  two.  One  study  concluded 
that  the  format  described  ir.  "c"  above  was  superior  to  the 
others.  It  was  most  bias  resistant,  yielded  consistently 
high  validities  under  various  conditions,  had  adequate 
reliability,  and  was  one  of  the  best  recieved  by  respondents. 


H.  Order  of  Response  Alternatives 


VI-H  Page  1 
1 Jul  76 


1.  General  Considerations 

The  experimental  evidence  on  the  effect  chat  the  order  of 
presentation  of  response  alternatives  for  a question  has  on 
a subject's  choice  of  response  is  inconclusive  and  contra- 
dictory. Varying  conclusions  include: 

a.  Respondents  have  a tendency  to  select  the  first  response 
alternative  in  a set  more  than  the  others. 

b.  With  multiple  choice  questions  there  is  tendency  to  choose 
answers  from  the  middle  of  the  list,  if  the  list  consists 
of  numbers,  and  from  either  the  top  or  bottom  of  the  list, 
if  the  alternatives  are  fairly  lengthy  expressions  of  ideas. 

c.  Poorly  motivated  respondents  tend  to  select  the  center  or 
neutral  alternatives  with  rating  scale  items. 

d.  On  items  about  which  respondents  feel  strongly  the  order 
of  alternatives  makes  no  difference.  On  items  about  which 
the  respondent  does  not  feel  strongly,  most  will  ten!  to 
check  the  first  alternative. 

e.  The  positive  pole  of  rating  scale  response  alternatives 
should  be  presented  first  since  this  will  improve  the 
reliability  of  the  responses.  However,  it  is  important 
to  realize  that  reliability  may  increase  while  validity 
decreases . 

Test  item  form  biases  are  discussed  in  Section  XI1-B. 

2 . Suggested  Order  for  Multiple  Choice  Items 

The  following  suggestions  are  offered  regarding  the  order  of 
multiple  choice  items: 

a.  When  the  response  alternatives  have  ar.  immediate  apparent 
logical  order  (e.g.v  they  all  relate  to  time)  tr.ey  should 
be  put  in  that  order. 

b.  When  the  response  alternatives  are  numerical  values,  they 
should  iii  general  be  put  in  either  ascending  or 
decreasing  order. 

c.  When  the  response  alternatives  have  no  immediately  apparent 
logical  order,  they  should  generally  be  put  in  random  order. 


j!  „ .ill  ?_ 


VI-H  Page  2 
1 Jul  76 


d.  Alternatives  such  as  "None  of  the  above"  or  "All  of  the 
above"  should  always  be  in  the  last  position. 

e.  Alternate  questionnaire  forms  (e.g.,  where  the  order  of 
alternatives  is  reversed  on  half  of  the  forms)  are  often 
desirable. 

3.  Suggested  Order  of  Rating  Scale  Items 

Since  rating  scales  call  for  the  assignment  of  objects  along 
an  assumed  continuum  or  in  ordered  categories  along  the  con- 
tinuum, it  follows  that  the  response  alternatives  must  be  in 
order  from  "high"  to  "low"  or  "low"  to  "high",  with  the  choice 
of  words  for  "high"  and  "low"  (the  end  point  labels)  depending 
upon  the  continuum  being  used.  For  example,  for  the  continuum 
satisfactory-unsatisfactory,  item  1 in  Figure  VI-H-1  uses  the 
"high"  to  "low"  order,  while  item  2 uses  the  order  "low"  to  "high". 


VI -H  Page  3 
1 Jul  76 


Many  practitioners  use  the  "high"  to  "low"  order.  If  one 
has  reason  to  believe  that  the  order  of  the  response  alter- 
natives makes  a difference,  or  wishes  to  make  certain  that 
they  do  not,  then  the  use  of  alternate  questionnaire  forms  is 
recommended.  Each  alternate  form  should  list  the  response 
alternatives  in  a different  order.  The  "good"  or  "high"  end 
of  the  scales  should  be  at  the  same  end  of  each  scale  for 
all  items  in  a given  quest ionaire  form,  but  the  order  should 
normally  be  reversed  on  50%  of  the  forms.  For  example,  the 
order  shown  in  item  1 in  Figure  VI-H-1  would  be  used  on  half 
of  the  forms,  the  order  shown  in  item  2 on  the  other  half. 
(Normally,  there  would  be  only  two  questionnaire  forms,  one 
with  each  order,  but  at  times  alternate  forms  are  also 
needed  for  other  purposes.  Hence,  there  may  be  more  than 
two. ) 


VTI-A  Page  1 
1 Jul  76 


Chapter  VII:  Response  Anchoring 


A.  Overview 

This  chapter  has  to  do  with  the  "anchoring"  of  rating  scale 
responses,  that  is,  with  the  words  used  to  define  some  or  all 
of  the  response  alternatives.  Section  VII-B  shows  variouc  types 
of  response  anchors,  while  Section  VII-C  discusses  anchored 
versus  unanchored  scales.  The  amount  of  verbal  anchoring  is  the 
topic  of  Section  VII-D,  while  some  procedures  for  the  selection 
of  verbal  scale  anchors  are  presented  in  Section  VII-E.  Finally, 
Section  VII-F  discusses  balanced  versus  unbalanced  scales. 

It  should  be  noted  that  Section  VI-C  3 discussed  th»  formuation 
of  response  alternatives,  while  the  number  and  order  of  response 
alternatives  are  the  topics  of  Sections  VI-G  and  VI-H,  respectively. 
The  order  of  perceived  favorableness  of  words  and  ohrases  is  dis- 
cussed in  Chapter  VIII. 


VIT-B  Page  1 
1 Jul  76 


B.  Types  of  Response  Anchors 

There  are  a number  of  different  types  of  response  anchors  that 
can  be  used  with  rating  scale  items.  Some  have  been  shown  as 
examples  in  other  chapters,  such  as  Section  VI-D.  Five  other 
types  of  response  anchors  are  shown  in  Figure  VII-B-1.  The  first 
shows  the  original  form  of  the  semantic  differential.  It  is  a 
combination  graphic  and  verbal  scale.  Respondents  were 
instructed  to  place  an  "X"  on  the  line  that  represented  their 
attitude.  The  use  of  verbal  anchors  with  a -5  through  +5 
numerical  continuum  is  shown  in  item  2 of  Figure  VII-B-1. 

Item  3 shows  verbal  anchors  used  with  a 1 through  11  numerical 
continuum.  A combination  verbal  and  nimerical  continuum  is 
shown  in  item  4,  while  a verbal  continuum  is  shown  in  items  5 
and  6.  Item  6 is  a typical  Likert  racing  scale  that  calls  for 
a verbal  rating  to  a directional  statement  that  may  be  phrased 
either  positively  or  negatively.  An  example  might  be  "The 
Modern  Volunteer  Army  places  too  much  emphasis  on  extrinsic 
factors  (such  as  beer  in  the  barracks)  as  opposed  to  intrinsic, 
job  related  factors  (such  as  pay  or  supervision)." 

Sufficient  empirical  support  exists  to  conclude  that  the 
reliability  of  scales  with  verbal  anchors  and  verbal  response 
alternatives  is  superior  to  that  of  purely  numerical  scales. 


Figure  VII-B-1 
Types  of  Response  Anchors 


VTI-B  Page  2 
1 Jul  76 


1.  Combination  graphic  and  verbal  scale. 

Strong : : : ^ : : : 1 : : : Weak 


2.  Verbal  anchors  with  a -5  through  +5  numerical  continuum. 


Definitely 

dislike 

-5  -4  -3  -2  -1 


Definietely 

like 

+1  +2  +3  +4  +5 


3 Verbal  anchors  with  a 1 through  11  numerical  continuum. 


Definitely 

dislike 


Definitely 

like 

10  11 


4.  A verbal  and  numerical  centiuum. 

Dislike  Dislike  Dislike  Neither  Like  Like  Like 

complete-  ^ome-  a like  nor  a some-  complete- 
ly what  little  dislike  little  what  ly 


5.  A verbal  continuum. 


Below 

average 


Aboi  t 
average 


A little 
better 


A lot 
better 


One  of 
the  best 


None 

better 


6.  A verbal  continuum.  (Likert  rating  scale) 

Agree  strongly  Agree  Undecided  Disagree  Disagree  strorgly 


I 


) 


I 


! 


VII-C  Page  1 
1 Jul  76 


C*  Anchored  Versus  Unanchored  Scales 


A number  of  studies  have  been  conducted  on  the  topic  known  as 
"anchoring  effects."  Unfortunately,  the  research  evidence  is 
contradictory  as  to  whether  anchored  or  unanchored  scales  should 
be  used.  It  has  been  noted  that  unanchored  scales  may  well  be 
anchored  by  the  question  stem,  so  that  the  response  alternatives 
may  not  have  to  be.  When  only  one  end  of  a scale  is  anchored, 
some  studies  have  found  a tendency  for  respondents  to  move 
toward  that  extreme.  But  other  studies  have  found  the  opposite 
tendency.  At  least  one  study  found  that  judgment  time  is 
decreased  with  anchoring.  In  practice,  then,  it  is  usually 
best  to  use  anchored  scales. 


VII-D  Page 
1 Jul  76 


D.  Amount  of  Verbal  Anchoring 

I Obviously  the  amount  of  verbal  anchoring  of  a rating  scale  item 

I can  vary.  It  can  be  anchored  at  the  center,  or  on  the  ends  or 

I both,  or  at  many  points  on  the  entire  continuum.  There  is  some 

evidence  that  mere  descriptive  data  can  be  obtained  with  more 
anchoring,  and  that  greater  scale  reliability  is  achieved  with 
added  verbal  anchoring.  Scales  with  verbal  descritpors  for  all 
response  alternatives  may  also  be  better  predictors  of  behavior. 
On  the  other  hand,  adding  examples  to  definitions  does  not  seem 
to  help  too  much.  (See  also  Section  VI-G  regarding  the  number 
of  response  alternatives  to  employ.) 


: ---■ ""  - - -*  - - 


VII -E  Page  1 
1 Jul  76 


E.  Procedures  for  the  Selection  of  Verbal  Scale  Anchors 


Some  guidance  can  be  offered  regarding  the  selection  of  verbal 

scale  anchors.  See  also  Chapter  VIII. 

1.  Scales  can  be  anchored  by  examples  of  expected  behavior 
based  upon  observations  of  behavior. 

2.  Pretests  for  the  selection  of  verbal  anchors  are  valuable  in 
building  scale  content.  Rather  than  employing  anchors  which 
seem  appropriate,  anchors  should  preferably  be  selected  by 
respondents  similar  to  those  who  will  be  participating  in 
the  study. 

3.  Scale  endpoints  that  are  unrealistically  extieme,  such  that 
few  if  any  respondents  would  select  them,  should  be  avoided. 
For  example,  it  may  be  seldom  that  "Never"  or  "Always"  apply, 
so  that  the  use  of  "Rarely"  and  "Usually"  may  be  more  appro- 
priate. There  are  instances  however,  where  extreme  state- 
ments are  realistic.  The  decision  here  often  requires 
experience  with  what  is  being  rated. 

4.  Analysis  of  data  is  normally  facilitated  if  verbal  scale 
ancho’ s selected  for  rating  scales  are  of  equal  distance 
from  each  other  in  terms  of  scale  values.  See,  however. 
Chapter,  VIII, 


VII-F  Page  1 
1 Jul  76 


Scale  Balance,  Midpoints,  and  Polarity 

1.  Balanced  Versus  Unbalanced  Scales 

Historically,  balanced  scales  have  been  preferred  by  researchers. 
A scale  is  balanced  when  it  has  a number  of  positive  response 
alternatives  equal  to  the  number  of  negative  alternatives, 
regardless  of  the  presence  or  absence  of  an  "indifferent"  or 
neutral  category.  A "Don't  know"  response  alternative,  if 
present,  is  not  considered  to  be  part  of  the  scale,  so  is  not 
counted  when  deciding  if  the  scale  is  balanced.  See  the 
examples  of  balanced  and  unbalanced  scales  in  Figure  VII-F-1. 
Unbalanced  scales  may  be  employed  if  pretest  results  indicate 
that  many  respondents  will  be  choosing  extreme  response  alter- 
natives at  one  end  of  a scale,  producing  a skewed  distribution 
of  responses  rather  than  the  statistically  expected  normal 
distribution  around  the  mean  attitude.  To  reduce  the  piling 
up  of  responses  at  one  end  of  a scale,  - or,  to  add  to  your 
abi1ity  to  discriminate  among  responses  in  that  region  - the 
scale  is  maac  unbalanced  by  adding  more  response  alternatives 
on  the  side  of  the  scale  where  the  piling  is  likely  to  occur. 

This  practice  tends  to  spread  the  distribution  of  responses 
more  evenly  along  the  scale  continuum. 

In  cases  where  one  has  no  advance  information  or  other  basis 
for  expecting  responses  to  be  largely  one-sided,  it  is  normally 
desirable  to  have  an  equal  number  of  positive  and  negative 
response  alternatives;  i.e.,  a balanced  scale. 


Midpoints 


Scales  may  or  may  not  include  a midpoint  or  neutral  response 
alternative;  this  does  not  affect  their  classification,  but 
does  affuct  their  response  distributions.  As  examples, 
items  lc,  2a,  and  3 in  Figure  VII-F-1  show  scales  with  no 
neutral  point.  One  might  exclude  the  nautral  point  for  items 
where  it  is  Judged  that  respondents  ought  to  have  a sufficient 
basis  for  being  pro  or  con  and  where  one  desires  to  for^e 
respondents  away  from  an  "on  the  fence"  position.  Bipolar 
scales  should  be  balanced  in  terms  of  the  degree  of  extreme- 
ness denoted  by  the  end  point  anchors.  For  example,  if 
"Never"  is  used,  then  "Always"  should  be  used  as  the  opposite 
end  point. 

3.  Polarity 


Scales  may  be  bipolar  or  unipolar.  Item  3 in  Figure  VII-F-1 
illustrates  a unipolar  scale.  Its  basic  feature  is  that  it 
represents  the  thing  being  assessed  as  having  from  none  to  a 


r • 


VII-F  Page  2 
1 Jul  76 


Figure  VII-F- 1 

Examples  of  Scale  Balance,  Midpoints,  and  Polarity 

1.  Balanced  bipolar  scales. 

a.  Very  progressive  b. 

Progressive 

Moderately  progressive 
Neither  progressive  nor  conservative 
Conservative 
Very  conservative 

d. 

c.  Very  effective 

Somewhat  effective 
Somewhat  ineffective 
Very  ineffective 

2.  Unbalanced  bipolar  scales. 

a.  Enthusiastic  b. 

Extremely  favorable 
Very  favorable 
Favorable 
Fair 
Poor 

3.  Unbalanced  Scale  (unipolar). 

Very  much 
Much 
Some 

A little 
None 


maximum  - with  n steps  in  between  - of  some  property.  The 
question  of  balance  only  arises  for  bipolar  scales.  Many  a 
bipolar  scaie  could  be  re-designed  as  a unipolar  scale. 
Instead  of  item  lc  in  Figure  VII-F-1,  one's  question  about 
effectiveness  (not  given)  could  have  been  followed  by  this 
unipolar  scale  of  effectiveness:  maximum  effectiveness, 

great  effectiveness,  moderate  effectiveness,  slight  effec- 
tiveness, and  no  effectiveness. 

Semantic  preferences  may  determine  whether  the  question- 
naire writer  uses  bipolar  or  unipolar  scales. 


Effective 
Fairly  effective 
Borderline 
Fairly  ineffective 
Ineffective 

Very  satisfied 
Satisfied 
Borderline 
Dissatisfied 
Very  dissatisfied 


Quite  good 
Rather  good 
Somewhat  poor 
Rather  poor 
Quite  poor 
Very  poor 


i 


I 


VIII-A  Page  1 
i Jul  76 


Chapter  VIII:  Empirical  Bases  for  Selecting 

Modifiers  for  Response  Alternatives 


A.  Overview 


When  constructing  a questionnaire,  it  is  often  necessary  to  select 
adjectives,  adverbs,  or  adjective  phrases  to  use  as  response  alter- 
natives. The  words  selected  for  response  alternatives  should  be 
clearly  understood  by  the  respondents  to  the  questionnaire  and  they 
should  have  precise  meaning.  There  should  be  no  confusion  among 
respondents  as  to  whether  one  term  denotes  a higher  degree  of 
favorableness  or  unfavorableness  than  another. 

There  is  no  need  to  guess  which  phrases  or  words  are  the  best 
to  use  as  response  alternatives.  Many  studies  have  been  conducted 
in  order  to  determine  the  perceived  favorableness  of  commonly  used 
words  and  phrases.  These  studies  have  determined  scale  values  and 
variances  for  words  and  phrases  which  can  be  used  to  order  the 
responsive  alternatives.  In  some  of  the  studies  ambiguous  words 
and  words  that  are  not  appropriate  to  use  as  response  alternatives 
have  been  identified. 

The  results  of  these  studies  and  the  experience  of  questionnaire 
designers  have  been  incorporated  into  this  chapter  in  order  to  offer 
guidelines  and  suggestions  to  be  used  in  selecting  response  alter- 
natives. This  chapter  includes  lists  of  words  and  procedures  to 
use  in  selecting  respo?:se  alternatives.  Many  lists  of  phrases  with 
mean  scale  values  and  standard  deviations  are  presented.  The  scale 
values  are  giv'in  for  the  purpose  of  selecting  response  alternatives, 
not  for  the  purpose  of  assigning  scale  values  to  response  alter- 
natives for  data  analysis  purposes. 

Section  VIII-B  discusses  things  to  consider  in  selecting 
response  alternatives;  Section  VIII-C  covers  the  selection  of 
response  alternatives  denoting  degrees  of  frequency;  Section  VIII-D, 
the  selection  of  response  alternatives  using  order  of  merit  lists 
of  descriptor  terms;  section  VIII-E,  the  selection  of  response 
alternatives  using  seals  valvtes  and  standard  deviations. 

Section  VI1I-F  includes  sample  sets  of  response  alternatives. 

Scale  values,  standard  deviations,  and  interquantile  ranges 
reported  in  this  chapter  have  been  taken  from  data  presented  in 
the  following  studies: 

1.  Altemeyer,  R.  A.  Adverbs  and  intervals:  A study  of  Likert 

scales.  Proceedings  of  the  Annual  Convention  of  the  American 

Psychological  Association,  1<*70,  _3(pt.  1),  397-  398. 


1 


r 


VI I I -A  Page  2 I 

1 Jul  76  I 

2.  Cliff,  N.  Adverbs  as  multipliers.  Psychological  Review,  1959,  I 

66,  27-44.  I 

3.  Dodd,  S.  C.,  & Gerberick,  T.  R.  Word  scales  for  degrees  of 

opinion.  Language  and  Speech,  1960,  18-31.  | 

4.  Gividen,  G.  M.  Order  of  merit-  descriptive  phrases  for  j 

questionnaires.  Fort  Hood  Texas:  OCRD  Army  Research  Institute 

Field  Unit,  22  February  1973. 

5.  Jones,  L.  V.,  & Thurstone,  L.  L.  The  psychophysics  of  semantics:  8 

An  experimental  investigation.  Journal  of  Applies  Psychology, 

1955,  39,  31-36. 

— 

6.  Matthews,  J.  J. , Wright,  C.  E.,  & Yudowitch,  K.  Analysis  of  the 

results  of  the  administration  of  three  sets  of  descriptive 
phrases.  Palo  Alto:  Operations  Research  Associates,  March  1975. 

7.  Mosier,  C.  I.  A psychometric  study  of  meaning.  Journal  of 
Social  Psychology,  1941,  JJ,  123-140. 

8.  Myers,  J.  H.,  & Warner,  W.  G.  Semantic  properties  of  selected 
evaluation  adjectives.  Journal  of  Marketing  Research,  1968,  2, 

409-412. 

9.  U.S.  Army  Test  and  Evaluation  Command.  Development  of  a guide 
and  checklist  for  human  factors  evaluation  ox  Army  equipment 

and  8y 8 terns.  U.S.  Army  Test  anJ  Evaluation  Command  (TECOM) , 1973. 


i 


E 


VIII-B  Page  1 
1 Jul  76 


B.  General  Considerations  in  the  Selection  of  Response  Alternatives 

There  are  several  ways  of  selecting  response  alternatives.  These 
ways  are  dependent  on  the  purpose  nf  the  questionnaires  and/or  on 
the  way  the  data  will  be  analyzed.  There  are  specific  considerations 
when  selecting  response  alternatives  for  balanced  scales,  when 
selecting  response  alternatives  with  extreme  values,  and  when 
developing  equal  interval  scales.  There  are  also  general  things 
to  consider  in  the  selection  of  any  response  alternative. 

In  some  cases  it  is  desirable  to  select  response  alternatives 
on  more  than  one  basis.  For  example,  mutually  exclusive  phrases 
may  be  selected  also  on  the  bases  of  parallel  wording. 

1.  Matching  the  Question  Stem 

Descriptors  should  be  selected  to  follow  rhe  question  stem. 

For  example,  if  the  stem  asks  for  degrees  of  usefulness, 
descriptors  such  as  "Very  useful"  and  "Of  significant  use" 
should  be  used.  In  come  cases  this  may  mean  rewording  the 
question  stem  so  that  appropriate  response  alternatives  can 
be  selected. 

2.  Mixing  Descriptors 

Descriptors  on  different  continuums  should  usually  not  be 
mixed.  For  example,  "Average"  should  never  be  use!  with 
quantitative  terms  or  qualitative  terms  such  as  "Excellent" 
c«:  "Good"  (since  "average"  performance  for  a group  may  very 
well  be  excellent  or  good  or  even  poor).  If  the  descriptors 
are  selected  fer  use  with  a question  stem  asking  about 
satisfactory  or  unsatisfactory,  the  word  "Satisfactory"  or 
"Unsatisfactory"  (or  a syi.onym)  should  normally  be-  in  every 
response  alternative,  except  perhaps  for  a neutral  response 
alternative. 

Some  experts  gc  as  tar  as  to  say  that  the  wording  of  the 
respeure  alternatives  should  be  parallel  for  balanced  scales. 
For.eSSuiiple,  if  the  phrase  "Strongl;  agree"  is  used  then  the. 
phrase  "Strongly  disagree"  shouid  also  be  usea.  By  reviewing 
some  of  the  studies  that  have  determined  scaie  values  for 
descriptors,  it  can  be  seen  that  some  pairs  of  parellel 
^phraces  are  not  equally  distant  from  a neutral  point  or  from 
other  phrases  in  terms  of  their  scale  v^iues.  Hence, 
parallel  wording  may  not  always  provide  equally  distant  pro 
and  con  response  alternatives,  although  they  may  be  perceived 
as  symmetrical  opposites. 


VIII-B  Page  2 
1 Jul  76 


Using  descriptors  from  one  continuum  or  descriptors  with 
parallel  wording  for  a given  questionnaire  item  has  advantages. 
The  advantages  are  that  the  response  alternatives  will  usually 
fit  the  stem  better,  and  they  will  be  parallel  to  each  other 
in  meaning  anu  appearance. 

3.  Selecting  Response  Alternatives  with  Clear  Meaning 

°ome  words  are  difficult  for  respondents  to  use  in  answering 
questions.  This  difficulty  may  be  the  result  of  the  respondent 
being  ignorant  of  the  meaning  of  the  word,  or  not  being  able  to 
rate  the  word  in  terms  of  degrees  on  specific  scales.  Such 
words  should  not  be  used  as  response  alternatives.  Some 
studies  asked  the  respondent  to  indicate  which  words  he  was 
unable  to  rate.  Table  VIII-B-1  lists  examples  of  words  that 
were  unrateable  by  subjects. 

Table  VIII-B-1 


Words  Considered  Unrateable  by  Subjects 


Phrase 

Phrase 

Adverse 

Noxious 

Appalling 

Peerless 

Base 

Satiating 

Despicable 

Seemly 

Expedient 

Superlative 

Fit 

From:  Mosicr  1941a. 

Some  words  appear  to  have  two  or  more  distinct  meanings. 
When  these  words  are  rated  on  a continuum  cf  favor ableness- 
unfavorableness,  many  respondents  will  check  around  one  part 
of  the  scale  while  the  other  respondents  will  check  around 
a different  place  on  the  scale.  It  is  said  that  these  words 
produce  bimodality  of  response.  Such  words  also  should  not 
be  used  as  response  alternatives.  A list  of  words  exhibiting 
bimodality  of  response  is  given  in  Table  V1I1-B-2. 


VIII-B  Page  4 
1 Jul  76 

opposite  (effective  vs.  ineffective;  pleasing  vs.  unpleasing) 
for  two  of  the  terms.  A more  extreme  pair  can  be  produced  by 
using  "Very"  to  modify  these  two  terms. 

The  first  of  several  intended  studies  of  how  people  rate/ 
order  terms  that  might  be  used  for  rating  scale  descriptors 
was  conducted  by  Operations  Research  Associates  and  ARI  just 
prior  to  the  writing  of  this  manual.  Its  results  may  assist 
questionnaire  developers  who  need  unbalanced  scales  or  scales 
with  more  than  five  descriptors.  In  the  study  each  of  100 
Amy  personnel  was  asked  to  assign  a scale  value  ranging  from 
-5  (most  negative)  to  +5  (most  positive)  to  each  term  in 
three  different  sets  of  terms,  totaling  over  100  descriptors. 

Tables  VIII-B-3  and  VIII-P-4  give  samples  of  descriptors 
from  this  study  for  which  mean  scale  values  and  standard 
deviations  have  been  calculated.  The  list  in  Table  VIII-B-3 
was  derived  by  first  selecting  the  descriptor  with  the  largest 
positive  mean.  The  next  doscritpor  selected  has  a mean  that  is 
at  least  one  standard  deviation  lower.  The  implication  of  the 
gap  of  one  standard  deviation  is  that  not  more  than  16%  of  the 
people  would  have  assigned  a lower  scale  value  to  the  first 
descriptor  than  they  did  to  the  second  descriptor,  and  vice 
versa.  To  this  extent  the  raters  disagreed  on  the  ordering  of 
these  two  terms  when  rating  about  50.  The  third  descriptor 
on  the  list  has  a mean  scale  value  yet  another  standard 
deviation  lower.  This  process  was  repeated  until  the 
descriptor  with  the  lowest  mean  scale  vnlue  was  selected.  A 
descriptor  was  not  used  if  its  standard  devir-tion  was  greater 
than  1.G00. 

The  list  on  Table  VIII-B-4  was  constructed  again  by 
skipping  at  least  one  standard  deviation  between  adjacent 
terms;  however,  the  starting  point  was  at  the  middle,  with 
the  word  "neutral." 

Use  of  Table  VIII-B-3  as  a 10-descriptor  unbalanced  scale 
is  not  highly  recommended.  If  one  wanted  a nine-descriptor 
scale,  he  could  use  the  four  adverbs  appearing  in  front  of 
"Acceptable"  in  the  table  in  that  sane  location,  and  also  use 
them  in  front  of  "Unacceptable"  in  reverse  order  to  create  a 
semantically  balanced  and  ordered  scale.  Or,  one  coulJ  use 
the  five  adverbs,  now  shown  below  "Neutral,"  both  above  and 
below  "Neutral"  to  create  an  11-descriptor  scale  of  accept- 
ability (or  effectiveness,  or  satisfactoriness,  etc.). 
"Neutral,"  however,  may  not  be  a suitable  midpoint  term  here 
as  the  respondent  who  has  neutral  feelings  (i.e.,  does  not 
know  or  does  not  cate)  might  check  this  response,  whereas 
the  term  "neutral"  is  intended  to  specify,  for  example,  a 
midpoint  between  "barely  acceptable"  and  "barely  unacceutible." 


Table  VIII-B-3 


VIII-B  Page  5 
1 Jul  76 


Sample  List  of  Phrases 
Denoting  Degrees  of  Acceptability 


Phrases 

Mean 

SD 

Wholly  acceptable 

4.725 

.56? 

Highly  acceptable 

4.040 

.63a 

Reasonably  acceptable 

2.294 

.722 

Barely  acceptable 

1.078 

.518 

Neutral 

.000 

.000 

Barely  unacceptable 

-1.100 

.300 

Rather  unacceptable 

-2.020 

.835 

Substantially  unacceptable 

-3.235 

.899 

Highly  unacceptable 

-4.220 

.576 

Completely  unacceptable 

-4.900 

.361 

From:  Matthf-ws,  Wright,  and  Yudowltch  (1975).  See 

Section  VII'i-A  6. 


Table  VIII-d-4 

A second  Sample  Lift  of  Phrases 
Denoting  Degrees  of  Acceptability 


Phrase 

Mean 

SD 

Very,  very  acceptable 

4.157 

.825 

Largely  acceptable 

3.137 

.991 

Mildly  acceptable 

1.686 

.700 

Sort  of  acceptable 

.940 

.645 

Neutral 

.000 

.000 

Barely  unacceptable 

-1.100 

.300 

Rather  unacceptable 

-2.020. 

.8?« 

Substantially  unacceptable 

-3.235 

.899 

Highly  unacceptable 

-4.294 

.535 

Completely  unacceptable 

-4.900 

. 361 

From:  Matthews,  Wright,  and  Yudowltch  (1975).  See 

Section  VIIt-A  6. 


^ ■ * wm^rnwy  i«i.  -jif.  jiij  pm.  . 


VIII-B  Page  6 
1 Jul  76 

Whilu  the  scale  values  from  the  studies  cited  are  useful, 
further  refinement  is  possible.  That  is,  once  having  selected 
a candidate  scale  (set  of  descriptors)  one  covld  then  conduct 
another  study  to  determine  if  relevant  judges  would  assign 
scale  values  indicating  equal  intervals  (among  means)  for  the 
terms  on  the  candidate  scale. 

6.  Selecting  Descriptors  for  End  Points 

Once  the  decision  has  been  made  to  how  extreme  the  endpoints 
of  a scale  should  be  (see  Section  VII-E  4),  the  descriptors 
should  be  selected  accordingly.  If  extreme  end  points  are 
desired,  descriptors  that  have  extreme  meaning  should  be 
selected.  One  guideline  that  can  be  used  in  selecting  these 
descriptors  is  to  use  those  that  have  the  highest  and  lowest 
icale  values.  Another  guideline  is  to  review  the  descriptors 
in  terms  of  their  apparent  meanings.  If  less  extreme  end 
points  are  desired,  descriptors  that  do  not  have  extreme 
scale  values  and  that  do  not  have  the  apparent  extreme 
meanings  should  be  selected. 

7.  Selecting  Midpoint  Responses 

In  selecting  a descriptor  for  a midpoint  response,  it  is 
necessary  to  use  a descriptor  that  is  neutral  in  meaning. 

Some  of  the  commonly  used  midpoin’ts  do  not  appear  as  neutral 
as  might  be  expected  to  some  respondents. 

Ta!-,le  VIII-B-5  lists  several  neutral  terms  with  their 
scale  values  and  standard  deviations.  This  list  may  be 
helpful  in  selecting  midpoint  responses. 

Words  common’.y  used  for  midpoint  responses  are  discussed 
below: 


a.  Ave-age . 

"Average"  should  never  be  used  in  conjunction  with  adjectives 
fi ch  as  "Excellent,"  "Good,"  etc.  "Average"  has  no  meaning 
wnen  used  with  these  words.  For  example,  "average  perfor- 
mance may  be  superior  or  it  may  be  complete!-  unsatisfactory. 
Furthermore,  most  evaluate rs  do  not  have  the  experience  or 
competence  to  even  know  what  an  "average"  performance  is. 
Typically,  when  "Average"  is  used  on  a field  test  evaluation 
form  only  5%  or  102  of  responders  rate  the  subject  as  balow 
average  and  30Z  or  402  rate  it  above  average.  The  data 
from  such  a question  indicate  that  the  response  alternatives 
arc  not  well  formulated.  Therefore,  as  a general  rule,  it 
is  usually  inappropriate  to  use  any  term  of  "Average"  in  a 
question:.  Ire,  and  ic  is  always  inappropriate  to  ust 
"Average"  in  conjunction  with  phrases  such  as  "Excellent," 
"Good,"  "Poor,"  etc. 


VIII-B  Page  7 
1 Jul  76 


-,T!« SRKfflt.'W  -! 


Table  VIII-B-5 

Neutral  Term  Scale  Values  and  Standard  Deviations 
as  Determined  by  Several  Different  Studies 


Term 

Mean 

Scale 

Value 

SD 

Theoretical 
Neutral 
Scale  Value 

About  average 

3.77 

.85 

3.50 

Acceptable 

.73 

.66 

.00 

Acceptable 

11.12 

2.59 

10.00 

Acceptable 

2.39 

1.46 

.00 

All  right 

10.76 

1.42 

10.00 

Average 

3.08 

3.00 

Average 

.86 

1.08 

.00 

Average 

10.84 

1.55 

10.00 

borderline 

-.02 

.32 

.00 

Borderline 

.00 

.20 

.00 

Borderline 

-.06 

.31 

.00 

Doesn't  make  any  difference 

2.83 

1.738 

5.00 

Don't  know 

4.82 

.82a 

5.00 

Fair 

6.5C 

— 

5.50 

Fair 

.78 

.85 

.00 

Fair 

9.52 

2.06 

10.00 

Fair 

4.96 

.778 

5.00 

Neutral 

.00 

.00 

.00 

Neutral 

.02 

.18 

.00 

Neutral 

9.80 

1.50 

10.00 

Neutral 

10.18 

2.01 

10.00 

Normal 

6.70 

1.43 

6.00 

Ordinary 

6.51 

1.43 

6.00 

O.K. 

.87 

1.24 

.00 

O.K. 

10.28 

1.67 

10.00 

So- 80 

10.08 

1.87 

10.00 

Undecided 

4.76 

3 73a 

5.00 

Interquartile  range  shown  rather  than  the  standard  deviation 

If  "Average"  is  used,  it  should  be  with  extreme  care  and 
only  when  one  is  interested  in  comparing  performances  or  items 
with  each  other.  It  should  not  be  used  when  one  desires  to 
find  out  how  "good"  or  how  "bad"  an  item  or  performance  is. 
Significantly  above  average  performance  may  be  extremely 
unsatisfactory. 


VIII-B  Page  8 
1 Jul  76 

b.  No  opinion. 

'No  opinion"  is  unacceptable  as  a neutral  term,  as  it 
usually  denotes  that  a person  has  no  opinion  due  to  lack 
of  knowledge  or  due  to  not  having  thought  about  an  issue. 

"No  opinion"  can  be  used  as  a response  alternative  if  it 
represents  a specific  type  of  information  that  is  wanted. 

c.  Neutral. 

"Neutral"  is  considered  as  a less  desirable  term  to  use 
than  "Borderline."  Although  every  respondent  in  the 
study  gave  the  term  zerc,  the  meaning  on  a questionnaire 
is  not  clear  (see  page  VIII-B  4).  Two  out  of  52  respondents 
indicated  it  was  unrateable.  In  another  study  "Neutral" 
had  a mean  scale  value  o£  .02  and  a standard  deviation  of 
.18.  Because  of  t'ne  ambiguity  of  meaning  of  "neutral" 

(e.g.,  feeling  of  tue  respondent  versus  midpoint  alternative) 
it  is  not  recommended  that  it  be  used  as  mid-point  on  most 
questionnaires. 

d.  Marginal. 

"Marginal"  is  sometimes  used  as  a midpoint  response 
alternative.  Interviews  vith  test  subjects  indicated 
that  the  term  "Marginal"  in  most  cases  had  a meaning  of 
above  "Borderline"  or  still  satisfactory,  but  very  close 
to  being  unsatisfactory.  Hence,  indications  are  that 
there  may  be  more  desirable  terms  to  use  than  "Marginal." 

e.  Borderline. 

"Borderline"  is  preferred  by  some  experts  as  a midpoint 
response.  In  an  administration  to  Fort  Hood  soldiers  of 
over  1,500  questionnaires  using  the  term  "Borderline"  as 
a midpoint,  there  was  not  one  instance  of  reported  con- 
fusion among  those  completing  the  questionnaires.  How- 
ever, there  are  times  when  "Borderline"  has  a larger 
standard  deviation  than  "Neutral."  (Again,  "neutral"  by 
definition  implies  zero  to  most  persons,  but  it's  frame 
of  reference  is  ambiguous) . 

f.  Uncertain. 


i, 

l 

I 


"Uncertain"  is  unacceptable  as  a neutral  term  as  it  implies 
that  with  additional  knowledge  or  thought  a decision  could 
be  made  that  would  fall  into  one  of  the  other  categories. 


VIII-B  Page.  9 
1 Jul  76 


Undecided. 

"Undecided"  is  also  unacceptable  as  a neutral  icem  for  the 
same  reasons  as  "Uncertain." 


Neither  agree  nor  disagree. 


"Neither  agree  nor  disagree"  and  similar  descriptors 
written  in  this  form  may  be  used  as  midpoint  responses. 
They  have  the  advantage  of  paralleling  the  rest  of  the 
descriptors  in  the  set,  and  they  denote  a position 
exactly  in  toe  middle  of  the  end  points.  This  term,  like 
"neutral,"  can  also  imply  uncertainty,  indecision  or  a 
lack  of  knowledge  rather  than  a firm  knowledge  that  it 
represents  a mid-point. 


i . Mo  effect. 

"No  effect"  may  be  employed  as  a netural  term  when  it 
is  used  with  a set  of  descriptors  to  measure  the  type 
of  effect  that  an  activitv  will  have.  For  instance,  it 
can  be  used  on  a continuum  from  beneficial  to  detrimental. 


Ordinary 


"Ordinary"  should  not  be  used  as  a neutral  item.  In  one 
study  its  scale  value  showed  marked  skewing  at  the  low 
extreme,  indicative  of  the  common  use  of  "ordinary"  to 
imply  inferior  it,r. 


k.  Fair. 

"Fair"  should  not  be  use!  as  a neutial  item.  In  one  study 
the  median  scale  value  for  "fair"  was  a full  point  above 
the  neutral  point.  It  appears  for  some  subjects  that  toe 
mealing  of  "fair"  is  distinctly  favorable. 


Acceptable. 


"Acceptable"  is  not  a desirable  word  to  use  as  a neutral 
item.  In  one  study  it  exhibited  a marked  bimodalitv  of 
response,  indicating  that  subjects  disagreed  on  the  degree 
of  favora lilt  ness  noted  bv  the  terra.  In  a recent  study 
"Acceptable"  had  a large  standard  deviation  of  1.46. 


VIII-B  Page  10 
1 Jul  76 


m.  Normal. 

"Normal"  is  not  a desirable  word  to  use  as  a neutral  item. 
In  one  study  it  exhibited  a marked  bimodality  of  response, 
indicating  that  the  word  "normal"  has  different  meanings 
for  different  subjects.  This  term  would  be  classified  as 
a synonym  for  "average." 

n.  Medium. 

"Medium"  may  possibly  be  used  as  a neutral  term.  In  one 
study  there  was  a piling  up  of  judgments  for  "Medium"  at 
the  neutral  scale  position. 

o.  O.K.  or  all  right. 

"O.K."  or  "All  right"  has  been  used  sometimes  as  a midpoint 
response  alternatives.  However,  they  have  a tendency  to  be 
rated  more  positively  than  neutral.  They  also  have  larger 
standard  deviations  than  other  terms  mentioned,  indicating 
that  there  is  ambiguity  in  their  meaning. 

p.  So-so. 


"So-so"  is  another  term  sometimes  used  as  a midpoint 
response.  In  one  study  it  had  a scale  value  of  10.08, 
which  was  very  close  to  the  neutral  scale  value  of  10.00, 
but  it  also  had  a fairly  large  standard  deviation  of  1.87. 
It's  use  ia  not  recommended. 

q.  Don't  know. 

"Don't  know"  is  an  unacceptable  term  to  use  as  a middle 
point.  It  usually  means  to  the  subject  that  with 
additional  knowledge  or  more  time  to  think  about  the 
issue,  he  could  choose  one  of  the  other  alternatives. 

r.  Doesn't  make  any  difference. 

"Doesn't  make  any  difference"  should  not  be  used  as  a 
midpoint  response  alternative  because  it  implies  a more 
negative  value  than  a neutral  value.  In  one  study  it 
had  a scale  value  of  2.83,  where  the  neutral  scale  value 
was  5.00.  It  also  had  an  interauartile  range  of  3.13, 
which  means  that  there  was  a lot  of  disagreement  among 
subjects  as  to  its  meaning. 


VIII-B  Page  12 
1 Jul  76 


What  are  the  consequences  to  the  developer  of  rating  scale 
items  of  discovering  a mean  505,-50%  split  as  in  the  ordering 
of  "Outstanding"  and  "Superior"?  Does  it  mean  they  cannot  be 
used  together  as  part  of  the  descriptors  cf  a rating  scale 
item?  The  answer  is,  "Normally  yes."  In  Figure  VIII-B-1, 
we  would  have  better  discrimination  if  "Outstanding"  were 
replaced  by  ^'Excellent,"  with  the  position  formerly  occupied 
by  "Excellent"  being  fLlled  by  "Very  good."  "Superior"  and 
"Outstanding"  or  similarly  overlapping  terms  should  normally 
not  be  used  on  the  same  scale. 


Figure  VIII-B-1 

Two  Formats  Using  "Outstanding"  and  "Lnperior" 
Superior 


Outstanding 

Excellent 


Good 

Fair 


Poor 

Outstanding  Excellent  Good  Fair  Poor 


(Circle  one  Word) 


When  functioning  as  questionnaire  consultants  or  developers 
in  field  test  situations  where  respondents  are  enlisted  personnel 
ARI  has  recommended  and  used  very  little  variety  in  its  rating 
scale  items.  Arrays  such  as  those  shown  in  Figure  V1II-B-2  are 
almost  always  proposed  and  used.  Sometimes  the  middle  term  is 
deleted.  Several  reasons  for  the  lack  of  variety  are  that  a 
standard  simple  format  1)  facilitates  comparability  of  rating 
distributions  with  previous  testa,  and  2)  facilitates  under- 
standing by  soldier  respondents,  who  are  often  not  high  school 
graduates. 


1.  1. 

2. 

3. 

4. 

5. 

6. 

2.  Superior 


II.UIJLU.  i,l  *fT 


waB^CMgg3BBSMgggga3CT 


Tr 


VIII-B  Page  11 
1 Jul  76 


8.  Selecting  Positive  and  Negative  Descriptors 

If  a balanced  scale  is  desired,  it  is  necessary  to  select  an 
equal  number  of  positive  and  negative  descriptors.  In  most 
cases  it  is  easy  to  determine  if  a descriptor  is  positive 
or  negative  by  seeing  on  which  side  of  the  neutral  point  its 
scale  value  falls.  For  example,  "Mildly  like"  has  a positive 
scale  value,  and  "Mildly  dislike"  has  a negative  scale  value. 

9.  Selecting  Terms  Showing  Equal  Intervals 

Some  experts  argue  that,  in  order  to  perform  analyses  on  the 
basis  of  numerical  values  or  weights,  the  intervals  between 
rating  scale  response  alternatives  should  be  equal.  This 
would  be  desirable,  but  in  many  cases  it  is  impossible  because 
many  words  have  not  been  assigned  scale  values.  But  when 
scale  values  are  available,  the  response  alternatives  can  be 
selected  as  equally  distant  apart  as  possible  when  doing  so 
is  considered  important. 

There  is  a tendency  for  some  questionnaire  constructors 
to  select  phrases  with  parallel  wording  to  indicate  equal 
intervals.  (They  may  also  do  so  for  other  reasons.)  How- 
ever, if  equal  intervals  are  considered  important,  phrases 
should  be  selected  based  upon  scale  values  if  available. 

For  example,  in  Table  VIII-T-9  "Highly  adequate"  nas  a 
scale  value  of  3.843  while  the  parallel  term  "Highly  inadequate" 
has  a scale  value  of  -4.196.  This  places  "Highly  inadequate" 
further  away  from  the  neutral  point  than  "Highly  adequate." 

10.  Use  of  Unsealed  Terms 


Some  discussion  is  in  order  regarding  the  use  of  terms  ignoring 
their  scale  values  or  to  which  no  scale  values  have  been 
assigned.  An  illustration  of  the  first  of  these  practices  is 
from  a study  in  which  ARI  had  21  Army  officers  involved  In 
operational  field  testing  rank-order  16  terms  that  included 
"Outstanding,"  "Superior,"  "Excellent"  and  "Very  Good." 
"Excellent"  was  ranked  as  less  positive  than  "Outstanding" 
by  14  of  the  officers,  while  it  was  ranked  as  less  positive 
than  "Superior"  by  17  of  the  officers.  However,  there  was 
maximum  disagreement  as  to  whether  "Outstanding"  or  "Supericr" 
was  first  or  second  on  the  scale.  That  is,  12  rated 
"Superior"  first  and  "Outstanding"  second,  while  nine  of  the 
officers  assigned  the  reverse  ordering  to  these  two  words. 

All  officers  ranked  "Outstanding,"  "Superior,"  and  "Excellent" 
as  more  positive  than  "Very  Good."  "Outstandine"  is  sometimes 
interpreted  to  denote  only  that  the  performance  is  among  the  best 
of  a group  - without  any  implication  as  to  duality,  e.g., 
although  a student's  grade  of  hr>  out  of  10C  points  was  failing, 
his  performance  mav  have  been  "Outstanding"  since  no  other 
student  In  the  class  scored  above  60! 


VIII-B  Page  13 

1 Jul  76 


Figure  VIII-B-2 

Response  Alternatives 
Frequently  Recommended  by  ARI 

( ) Very  satisfactory 

( ) Satisfactory 

( ) Borderline 

( ) Unsatisfactory 

( ) Very  unsatisfactory 

( ) Very  effective 
( ) Effective 
( ) Borderline 
( ) Ineffective 
( ) Very  ineffective 

( ) Very  acceptable 
( ) Acceptable 
( ) Borderline 
( ) Unacceptable 
( ) Very  unacceptable 


VIII-C  Page  1 
1 Jul  76 


k 

C.  Selection  of  Response  Alternatives  Denoting  Degrees  of  Frequency 

ft 

Some  questionnaire  designers  use  verbal  descriptors  to  denote 
degrees  of  frequency.  Table  VIII-C-1  shows  such  a list  of  verbal 
descriptors.  A study  showed  that  there  was  a great  deal  of  vari- 
ability in  meaning  for  frequency  phrases.  Questionnaires  should, 
whenever  possible,  use  response  alternatives  that  include  a number 
designation  or  percentage  of  time  meant  by  each  word  used  as  a 
response  alternative. 


Table  VIIJ-C-1 
Degrees  of  Frequency 


Phrase 

Scale 

Value 

Inter- 

Quartile 

Range 

Always 

8.99 

.52 

Without  fail 

8.89 

.61 

Often 

7.23 

1.02 

Usually 

7.17 

1.36 

Frequently 

6.92 

.77 

Now  and  then 

4.79 

1.40 

Sometimes 

4.78 

1.83 

Occasionally 

4.13 

2.06 

Seldom 

2.45 

1.05 

Rarely 

2.08 

.61 

Never 

1.00 

.50 

From:  Dcdd  and  Getberick  (I960).  See 

Section  7IIT-A  3. 


tr 


SSSSBuiSs 


VIII-D  Page  1 
1 Jul  76 

D.  Selection  of  Response  Alternatives  Using  Order  of  Merit  Lists 
of  Descriptor  Terms 

An  order  of  merit  list  of  descriptors  does  not  provide  sccle 
values  nor  show  the  variance  of  each  phrase  of  some  continuum. 

In  addition,  the  list  does  not  represent  an  equal  interval 
scale.  However,  such  lijts  are  still  useful  for  selecting 
response  alternatives,  if  the  main  concern  is  to  select  response 
categories  so  that  each  respondent  will  agree  on  the  relative 
degree  of  "goodness"  of  me  terms,  iables  VII1-D-1  and  VIII-D-2 
give  examples  of  order  of  merit  lists  of  descriptor  terms. 


Table  VIII-D-1 

Order  of  Merit  of  Selected  Descriptive  Terms 


Order  of  merit 

Descriptive  Term 

1 

Very  superior 

2 

Very  outstanding 

3 

Superior 

4 

Outstanding 

5 

Excellent 

6 

Very  good 

7 

Good 

8 

Very  satisfactory 

9 

Satisfactorv 

10 

Marginal 

11 

Borderline 

12 

Poor 

13 

Unsatisfactory 

14 

Bad 

15 

Very  poor 

16 

Very  unsatisfactory 

17 

Very  bad 

18 

Extremely  poor 

19 

Extremely  unsatisfactory 

20 

Extremely  bad 

From:  Gividen  (1973).  Section  V1IT-A  A. 


Table  VITI-D-2 


Order  of  Merit  of  Descriptive  Terms 
Using  "Use"  as  a Descriptor 


Order  of  merit 

Descriptive  term 

1 

-L 

Extremely  useful 

2 

Very  useful 

3 

Of  significant  use 

4 

Of  considerable  use 

c 

■J 

Of  much  use 

6 

Of  moderate  use 

7 

Of  use 

8 

Of  some  use 

9 

Of  little  use 

10 

Not  very  useful 

11 

Of  slight  use 

12 

Of  very  little  use 

13 

Of  no  use 

From: 


Gxviden  (1973) 


See  Section  Vlfl-A  4 


« 


w 


VIII-E  Page  1 
1 Jul  76 

Selection  of  Response  Alternatives  Using  Scale  Values  and  Standard 
Deviations 


Using  scale  values  and  standard  deviations  to  select  response 
alternatives  will  give  a more  refined  set  of  phrases  than  using 
an  order  of  merit  list.  Other  sections  above  have  discussed 
specific  considerations  in  selecting  descriptors.  In  general, 
response  alternatives  selected  from  lists  of  phrases  with  scale 
values  should  usually  have  the  following  characteristics: 

1.  The  scale  vaules  of  the  terms  should  be  as  far  apart  as  possible. 

2.  The  scale  values  of  the  terms  should  be  as  equally  distant  as 
possible. 

3.  The  terras  should  hrve  small  variability  (small  standard 
deviations  or  interquartile  ranges). 

4.  Other  things  being  equal,  the  terms  should  have  parallel 
wording. 

Tables  VIII-E-1  through  VIII-E-24  give  lists  of  phrases  which 
have  scale  values  and,  when  possible,  standard  deviations  or  inter- 
quartile range.  They  are  based  on  empirical  evidence,  and  may  be 
used  to  select  response  alternatives. 


VIII-E  Page  2 
1 Jul  76 


f 

l- 

I 


\ 

' 


l 


f 


Table  V1II-E-1 


Acceptability  Phrases 


Phrase 

Average 

SD 

Excellent 

6.27 

.54 

Perfect  in  every  respect 

6.22 

.86 

Extremely  good 

5.74 

.81 

Very  good 

5.19 

.75 

Unusually  good 

5.03 

.98 

Very  good  in  most  respects 

4.62 

.72 

Good 

4.25 

.90 

Moderately  good 

3.58 

.77 

Could  use  souie  minor  changes 

3.28 

1.09 

Not  good  enough  for  extreme  conditions 

3.10 

1.30 

Not  good  for  rough  use 

2.72 

1.15 

Not  very  good 

2.10 

.85 

Needs  major  changes 

1.97 

1.12 

Barely  acceptable 

1.79 

.90 

Not  good  enough  for  general  use 

1.76 

1.21 

Better  than  nothing 

1.22 

1.08 

Poor 

1.06 

1.11 

Very  poor 

.76 

.95 

Extremely  poor 

.36 

.76 

From:  U.S.  Army  (1973).  See  Section  VIII-A  9. 


i 


GKSWSEfc! '.  -lI5fiK88BS*Jif  Vi  T 


VIII-E  Page.  3 
1 Jul  76 


Table  VIII-E-2 

Degrees  of  Excellence:  First  Set 


Phrase 

Scale 

Value 

SD 

Superior 

20.12 

1.17 

Fantastic 

20.12 

0.83 

Tremendous 

19.84 

1.31 

Superb 

* 19.80 

1.19 

Excellent 

19.40 

1.73 

Terrific 

19.00 

2.45 

Outstanding 

18.96 

1.99 

Wonderful 

17.32 

2.30 

Delightful 

16.92 

1.85 

Fine 

14.80 

2.12 

Good 

14.32 

2.08 

Pleasant 

13.44 

2.06 

Nice 

12.56 

2.14 

Acceptable 

11.12 

2.59 

Average 

10.84 

1.55 

All  right 

10.76 

1.42 

O.K. 

10.28 

1.67 

Neui  *al 

9.80 

1.50 

Fair 

9.52 

2.06 

Mediocre 

9 44 

1.80 

Unpleasant 

5.04 

2.82 

Bad 

3.88 

2.19 

Very  bad 

3.20 

2.10 

Unacceptable 

2.64 

2.04 

Awful 

1.92 

1.50 

Terrifcie 

1 76 

.77 

Horrible 

1.48 

.87 

From:  Myers  and  Warner  (1968).  See 

Section  VIII-A  8. 


una.' 


VIII-E  Page  4 
1 Jul  76 


Table  VIII-E-3 

Degrees  of  Excellence:  Second  Set 

Phrase  Scale 

Value  SD 


Best  of  all 

Excellent 

Wonderful 

Mighty  fine 

Especially  good 

Very  good 

Good 

Pleasing 

O.K. 

Fair 

Only  fair 
Not  pleasing 
Poor 
Bad 

Very  bad 
Terrible 


6.15 

2.48 

3.71 

1.01 

3.51 

.97 

2.88 

.67 

2.86 

.82 

2.56 

.87 

1.91 

.76 

1.58 

.65 

.87 

1.24 

.78 

.85 

.71 

.64 

-.83 

.67 

-1.55 

.87 

-2.02 

.80 

-2.53 

.64 

-3.09 

.98 

From:  Jones  and  Thurstone  (1955} 

See  Section  VIII-a  5.  U^* 


rrr*-/ , ■ 


VIII-E  Page  5 
1 Jul  76 


Table  VIII-E-4 


Degrees  of  Like  and  Dislike 


Phrase 

Scale 

Value 

SD 

Like  extremely 

4.16 

1.62 

Like  intensely 

4.05 

1.59 

Strongly  like 

2.96 

.69 

Like  very  much 

2.91 

.60 

Like  very  well 

2.60 

.78 

Like  quite  a bit 

2.32 

.52 

Like  fairly  well 

1.51 

.59 

Like 

1.35 

.77 

Like  mcuerately 

1.12 

.61 

Mildly  like 

.85 

.47 

Like  slightly 

.69 

.32 

Neutral 

.02 

.18 

Lik'?  not  so  well 

-.30 

1.07 

Like  not  so  trich 

-.41 

.94 

Dislike  slightly 

-.59 

.27 

Mildly  dislike 

-.74 

.35 

Dislike  moderately 

-1.20 

.41 

Dislike 

-1.58 

.94 

Don’;  like 

-1.81 

.97 

Strongly  dislike 

-2.37 

.53 

Dislike  very  much 

-2.49 

.64 

Dislike  intensely 

-3.33 

1.39 

Dislike  extremely 

-4.32 

1 .86 

From:  Jones  and  Thur stone  (1955). 

See  Section  VIII-A  5. 


t 

\ 


a 


* 


* --  iiy-  • W-iiiTflir  - . 


VIII-E  Page  6 
1 Jul  76 


Table  VIII-E-5 


Degrees  of  Good  and  Poor 


Phrase 

Scale 

Value 

SD 

Exceptionally  good 

18.56 

2.36 

Extremely  good 

18.44 

1.61 

Unusually  good 

17.08 

2.43 

Remarkably  good 

16.68 

2.19 

Very  good 

15.44 

2.77 

Quite  good 

14.44 

2.76 

Good 

14.32 

2.08 

Moderately  good 

13.44 

2.23 

Reasonably  good 

12.92 

2.93 

Fairly  good 

11.96 

2.42 

Slightly  good 

11.84 

2.19 

So-so 

10.08 

1.87 

Not  very  good 

6.72 

2.82 

Moderately  poor 

6.44 

1.64 

Reasonably  poor 

6.32 

2.46 

Slightly  poor 

5.92 

1.96 

Poor 

5.72 

2.09 

Fairly  poor 

5 . 64 

1.68 

Quite  poor 

4.80 

1.44 

Unusually  poor 

3.20 

1.44 

Very  poor 

3.12 

1.1.7 

Remarkably  poor 

2.88 

1.74 

Exceptionally  poor 

2.52 

1.19 

Extremely  poor 

2.08 

1.19 

From:  Myers  and  Warner 

See  Section  VTII-A  8. 

(1968). 

VIII -E  Page  7 
1 Jul  76 


Table  VIII-E-6 
Degrees  of  Good  and  Bad 


Phrase  Scale 

Value 


Extremely  good  3.449 

Very  good  3.250 

Unusually  good  3.243 

Decidedly  good  3.024 

Quite  good  2.880 

Rather  good  2.755 

Good  2 

Pretty  good  2.622 

Somewhat  good  2.462 

Slightly  good  2.4P 

Slightly  bad  1.497 

Somewhat  bad  U323 

Rather  bad  i t-j? 

Bad  1.024 

Pretty  bad  1.018 

Quite  bad  924 

Decidedly  bad  .797 

Unusually  bad  ,662 

Very  bad  *639 

Extremely  bad  ,470 


From:  Cliff  (1959).  See  Section  VlII-A  2. 


rTv^tsw.  asas^u-v 


VIII -E  Page  8 
1 Jul  76 


Table  VIII-E-7 


Degrees  of  Agree  and  Disagree 


Phrase 

Mean 

SD 

Decidedly  agree 

2.77 

.41 

Quite,  agree 

2.37 

.49 

Considerably  agree 

2.21 

.42 

Substantially  agree 

2.10 

.50 

Moderately  agree 

1.47 

.41 

Somewhat  agree 

.94 

.41 

Slightly  agree 

.67 

.36 

Perhaps  agree 

.52 

.46 

Perhaps  disagree 

-.43 

.46 

Slightly  disagree 

-.64 

.38 

Somewhat  disagree 

-.93 

.47 

Moderately  disagree 

-1.35 

.42 

Quite  disagree 

-2.16 

.57 

Substantially  disagree 

-2.17 

.51 

Considerably  disagree 

-2.17 

.45 

Decidedly  disagree 

-2.76 

.43 

From:  Altemeyer  (1970).  See  Section  VTII-A  1, 


MMilf! 


VIII- E Page  9 
1 Jul  76 


Table  VIII-E-8 


Degrees  of  More  and  Less 


Phrase 

Scale 

Value 

In ter - 
quar tile 
Range8 

Very  much  more 

8.02 

.61 

Much  more 

7.67 

1.04 

A lot  more 

7.50 

1 06 

A good  deal  more 

7.29 

.98 

Mot  e 

6.33 

1.01 

Somewha  t more 

6.25 

.98 

A little  more 

6.00 

.58 

Slightly  more 

5.99 

.57 

Slightly  less 

3.97 

.56 

A little  less 

3.96 

.54 

Less 

3.64 

1.04 

Much  less 

2.55 

1.06 

A good  deal  less 

2.44 

1.11 

A lot  less 

2.36 

1.03 

Very  much  less 

1.96 

.52 

From:  Dodd  and  Gcrberick  (1960), 

See  Section  VIII-A  3. 

3 Minimum  = 0.5. 


s.  ^ 


Table  VIII-E-9 


Degrees  of  Adequate  and  Inadequate 


Phrase 

Mean 

SD 

Totally  adequate 

4.620 

.846 

Absolutely  adequate 

4 i 540 

.921 

Completely  adequate 

4 490 

.825 

Extremely  adequate 

4.412 

.719 

Exceptionally  adequate 

4 . 330 

.869 

Entirely  adequate 

4.340 

.863  ;/ 

Wholly  adequate 

4.314 

3.038 

Fully  adequate 

4.294 

.914 

Very  very  adequate 

4.063 

.£76 

Perfectly  adequate 

3.922 

1.026 

Highly  adequate 

3.843 

.606 

Most  adequate 

3.84! 

978 

Very  adequate 

3.421- 

.851 

Decidedly  adequate 

3.140 

1.536 

Considerably  adequate 

3.020 

.874 

Quite  adequate 

2.980 

.979 

Largely  adequate 

2.863 

.991 

Substantially  adequate 

2.608 

1.030 

Reasonably  adequate 

2.412 

.771 

Pretty  adequate 

2.3C6 

.862 

Rather  adequate 

1.755 

.893 

Mildly  adequate 

1.571 

.670 

Somewhat  adequate 

1.327 

.793 

Slightly  adequate 

1.200 

.566 

Barely  adequate 

.627 

.928 

Neutral 

.000 

' .000 

Borderline 

-.020 

.316 

Barely  inadequate 

-1.157 

.638 

Mildly  inadequate 

-1.353 

.621 

Slightly  inadequate 

-1.380 

.772 

Somewhat  inadequate 

-1.882 

.732 

Rather  inadequate 

-2.102 

.974 

Moderately  inadequate 

-2.157 

1.017 

Fairly  inadequate 

-2.216 

.800 

Pretty  inadequate 

-2.347 

.959 

Considerably  inadequate 

-3.600 

.680 

Very  inadequate 

-3.735 

.777 

Decidedly  inadequate 

-3.780 

.944 

Most  inadequate 

-2.980 

1.545 

Highly  inadequate 

-4.1% 

.741 

(Table  continued  on  next  page) 


VIII -E  Page  11 
1 Jul  76 


Table  VIII-E-9  (Cont.) 
Degrees  of  Adequate  and  Inadequate 


Phrase 

Mean 

SD 

Very  very  inadequate 

-4.460 

.537 

Extremely  inadequate 

-4.608 

.527 

Fully  inadequate 

-4.667 

.676 

Exceptionally  inadequate 

-4.680 

.508 

Wholly  inadequate 

-4.784 

.498 

Entirely  inadequate 

-4.792 

.644 

Completely  inadequate 

-4.800 

.529 

Absolutely  inadequate 

-4.880 

.431 

Totally  inadequate 

-4.900 

.412 

From:  Matthews,  Wright, 

and  Yudowitch 

(1975). 

See  Section  VITI-A  6. 

Table  VTII-E-10 

Degrees  of  Acceptable  and  Unacceptable 


Phrase 

Mean 

S b 

Wholly  acceptable 

4.725 

.563 

Completely  acceptable 

4.686 

. 61C 

Full)  acceptable 

4.412 

. .867 

Extremely  acceptable 

4.392 

.716 

Most  acceptable 

4.157 

.915 

Very  very  acceptable 

4.157 

.825 

Highly  acceptable 

4.040 

.631 

Quite  acceptable 

3.216 

.956 

Largely  acceptable 

3.137 

.991 

Acceptable 

2.392 

1.456 

Reasonably  acceptable 

2.294 

. ill 

Moderately  acceptable 

2.280 

.722 

Pretty  acceptable 

2.000 

1.125 

(Tabic  continued  on  next  page) 


VIII-E  Page  12 
1 Jul  76 


Table  VIII-E-10  (Coat.) 

Degrees  of  Acceptable  and  Unacceptable 


Phrase 

Mean 

3D 

Rather  acceptable 

1.939 

.813 

Fairly  acceptable 

1.840 

.924 

Mildly  acceptable 

1.686 

.700 

Some  ?hat  acceptable 

1.458 

1.241 

Barely  acceptable 

1.078 

.51? 

Slightly  acceptable 

1.039 

.522 

Sort  of  acceptable 

.140 

.645 

Borderline 

.000 

.200 

Neutral 

.000 

.000 

Marginal 

-.120 

.515 

Barely  unacceptable 

-1.100 

.300 

Slightly  unacceptable 

-1.255 

.589 

Somewhat  unacceptable 

-1.765 

.674 

Rather  unacceptable 

-2  .020 

.836 

Fairly  unacceptable 

-2.160 

.880 

Moderately  unacceptable 

-2.340 

.681 

Pretty  unacceptable 

-2.412 

.662 

Reasonably  unacceptable 

-2.440 

.753 

Unacceptable 

-2.667 

1 . 381 

Substantially  unacceptable 

-3.235 

.899 

Quite  unacceptable 

-3.388 

1.066 

Largely  unacceptable 

-3.392 

.818 

Considerably  unacceptable 

-3.440 

.779 

Notably  unacceptable 

-3.500 

1.044 

Decidedly  unacceptable 

••3.837 

1.017 

Highly  unacceptable 

-4.2'V. 

.535 

Most  unacceptable 

-4.420 

.724 

Very  very  unacceptable 

-4.490 

.500 

Exceptionally  unacceptable 

-4 . 540 

.607 

Extremely  unacceptable 

-4.686 

.464 

Completely  unacceptable 

-4.900 

.361 

Entirely  unacceptable 

-4.900 

.361 

Wholly  unacceptable 

-4.922 

.269 

Absolutely  unacceptable 

-4.922 

.334 

Totally  unacceptable 

-4.941 

.235 

■w 


From:  Matthews,  Wright,  and  YudowJtch  (1975). 

See  Section  VITI-A  6. 


VIII-E  Page  13 
1 .Tul  76 


Table  VIII-E-11 


Comparison  Phrases 


Phrase 

Mean 

SD 

Best  of  all 

4.896 

.510 

Absolutely  best 

4.843 

.459 

Truly  best 

4.600 

.721 

Undoubtedly  best 

4.569 

.823 

Decidedly  best 

4.373 

.839 

Best 

4.216 

1.459 

ADsolutely  better 

4.060 

.988 

Extremely  better 

3.922 

.882 

Substantially  best 

3.700 

.922 

Decidedly  better 

3.412 

.933 

Conspicuously  better 

3.059 

.802 

Moderately  better 

2.255 

.737 

Somewhat  better 

1.843 

.801 

Rather  better 

1.816 

.719 

Slightly  better 

1.157 

.776 

Barely  better 

.961 

.656 

Absolutely  alike 

.588 

1.623 

Alike 

.216 

.847 

The  same 

.157 

.801 

Neutral 

.000 

.000 

Borderline 

-.061 

.314 

Marginal 

-.184 

.919 

Barely  worse 

-1.039 

.816 

Slightly  worse 

-1.216 

.498 

Somewhat  worse 

-2.078 

.860 

Moder&tely  worse 

-2.220 

.944 

Noticcnbly  worse 

-2.529 

1.036 

Worse 

-2.667 

1.423 

Notably  worse 

-3.020 

1.038 

Largely  worse 

-3.216 

1.108 

Considerably  worse 

-3.275 

1.206 

Conspicuously  worse 

-3.27: 

.887 

Much  worse 

3 286 

.808 

Substantially  worse 

-3.460 

.899 

Decidedly  worse 

-3.760 

.907 

Very  much  woise 

-3.941 

.752 

Absolutely  worse 

-4.431 

.823 

Decidedly  worst 

-4.431 

.748 

Undoubtedly  worst 

-4.510 

.872 

Absolutely  worst 

-4.686 

1.29i 

Worst  of  all 

-4.776 

1.298 

From:  Matthews,  Wright,  and  Yudowitch  (1975). 

See  Section  VTTT-N  6. 


VIII -E  Page  14 
1 Jul  76 


Table  VIII-E-12 


Degrees  of  Satisfactory  rod  Unsatisfactory 


Phra  se 

Scale 
Va  lue 

SD 

Quite  satisfactory 

4.35 

.95 

Satisfactory 

3.69 

.87 

Not  very  satisfactory 

2.11 

.76 

Unsatisfactory  but  usable 

2.00 

.87 

Very  unsatisfactory 

.69 

1.32 

From:  U.S.  Army  (1973). 

See  Section  VIII-A  9. 

Tab! n VIII-E-13 
Degrees  of  Unsatisfactory 

Scale 

Phrase 

Value 

Unsatisfactory 

1.47 

Quite  unsatisfactory 

1.00 

Very  unsatisfactory 

.75 

Unusually  unsatisfactory 

.75 

Highly  unsatisfactory 

.71 

Very,  very  unsatisfactory 

.25 

Extremely  unsatisfactory 

.10 

Completely  unsatisfactory 

.00 

From:  Hosier  (1941).  See  Section  VIII-A  7 


VIII-E  Page 
1 Jul  76 


Table  VIII-E- 14 


Degrees  of  Pleasant 

Phrase 

Scale 

Value 

Extremely  pleasant 
Very  pleasant 
Unusually  pleasant 
Decidedly  pleasant 
Quite  pleasant 
Pleasant 
Rather  pleasant 
Pretty  pleasant 
Somewhat  pleasant 
Slightly  pleasant 

3.490 

3.174 

3.107 

3.028 

2.849 

2.770 

2.743 

2.738 

2.505 

2.440 

From:  Cliff  (1959).  See  Section  VIII-A  2. 

Table  VIII-E  15 

Degrees  of  Agreeable 

Phrase 

Sea  le 
Va  1 ue 

Very,  very  agreeable 
Extremely  agreeable 
Highly  agreeable 
Completely  agreeable 
Unusually  agreeable 
Very  agreeable 
Quite  agreeable 
Agreeable 

5.34 

5.10 

5.02 

4.96 

4.86 

4.82 

4.45 

4.19 

Fron,-:  Mosier  (1941).  See  Section  VIIl-A  7. 





VIII -E  Page  16 
1 Jul  76 


Table  VIII-E-16 


Degrees  of  Desirable 


Phraee 

Scale 

Value 

Very,  very  desirable 

5.66 

Extremely  desirable 

5.42 

Completely  desirable 

5.38 

Unusually  desirable 

5.23 

Highly  desirable 

5.15 

Very  desirable 

4.96 

Quite  desirable 

4.76 

Desirable 

4.50 

From:  Mosier  (1941) . 

See  Section  VIIT-A  7 

Table  VIII-E-17 
Degrees  of  Nice 

Scale 

Phrase 

Value 

Extremely  nice 

3.351 

Unusually  nice 

3.155 

Very  nice 

3.016 

Decidedly  nice 

2.969 

Pretty  nice 

2.767 

Quite  nice 

2.738 

Nice 

2.636 

Rather  nice 

2.568 

Somewhat  nice 

2.438 

Slightly  nice 

2.286 

Fiom: 


Cliff  (1959).  See  Section  VIII-A  2. 


VIII -E  Page  17 
1 Jul  76 


fPPPTf  ’'rs  ‘'’"XT'"'- 


Table  VIII-E  18 


Degrees  of  Adequate 


Phrase 

Sea  le 
Value 

SD 

More  than  adequate 

A.  13 

1.11 

Adequate 

3.39 

.87 

Not  quite  adequate 

2.40 

.85 

Barely  adequate 

2.10 

.84 

Not  adequate 

1.83 

.98 

From:  U.S.  Army  (1973). 

See  Section 

VIII-A  9 

Table  VIII-E-19 
Degrees  of  Ordinary 

Scale 

Phrase 

Value 

Ordinary 

2.074 

Very  ordinary 

2.073 

Somewhat  ordirary 

? .038 

Rather  ordinary 

2.034 

Pretty  ordinary 

2.026 

Slightly  ordinary 

1.980 

Decidedly  ordinary 

1.949 

Extremely  ordinary 

1.936 

Unusually  ordinary 

1.875 

From:  Cliff  (195- 


See  Section  VIII-A  2. 


VIIT-E  Page  18 
1 Jul  76 


h 

t 

r 

I 

L 

1 

l, 

f 


Tablt:  VIII-E-20 


Degrees  of  Average 


Phrase 

Scale 

Value 

Ra  ther  average 

2.172 

Average 

2.145 

Quite  average 

2.101 

Pretty  average 

2.094 

Somewhat  average 

2.080 

Unusually  average 

2.062 

Extremely  average 

2.052 

Very  average 

2.039 

Slightly  average 

2.023 

Decidedly  average 

2.020 

From;  Cliff  (1959).  See  Section  VITI-A  2. 


Table  VIIT-E-21 


Degrees  of  Hesitation 


Phrase 


Without  hesitation 
With  little  hesitation 
Hesitant 

With  some  hesitation 
With  considerable  hesitation 
With  much  hesitation 
With  great  hesitation 


Scale 

Value 

Inter- 
quartile 
Ra  ngea 

7.50 

6.54 

5.83 

3.40 

4.77 

1.06 

4 .38 

1.60 

3.29 

3.39 

3.20 

5.25 

? .41 

6.00 

From:  Dodd  and  Ger^erick  (1960).  See  Section  VITI-A  3. 


* Minimum  = 0.5. 


i 


VIII-E  Page  19 
1 Jul  76 


Table  VIII-E-22 
Degrees  of  Inferior 


Phrase  Scale 

Value 


Slightly  inferior  1.520 
Somewhat  inferior  1.516 
Inferior  j.  323 
Rather  inferior  1.295 
Pretty  inferior  1.180 
Quite  inferior  1.127 
Decidedly  inferior  1.013 


usually  inferior 
Very  inferior 
Extremely  inferior 


From:  Cliff  (19r9).  See  Section  V1II-A  2. 


Table  VIII-E-23 

Degrees  of  Poor 

Phrase 

Scale 

Value 

Poor 

1.60 

Quite  poor 

1.30 

Very  poor 

1.18 

Unusually  poor 

.95 

Extremely  poor 

.95 

Completely  poor 

.92 

Very,  very  poor 

.55 

from:  Mosier  (1941).  See  Section  VIXI-A  7. 


- ■■  - - •*  


VIII-E  Page  20 
1 Jul  76 


Table  VIII-E-24 


Descriptive  Phrases 


Phrase 

Scale 

Value 

Inter - 
quart! le 
Range3 

Complete 

8.85 

.65 

Extremely  vital 

8.79 

.84 

Very  certain 

8.55 

1.05 

Very  strongly 

8.40 

1.04 

Very  crucial 

8.29 

1.12 

Very  important 

8.22 

1.16 

Very  sure 

8.15 

.95 

Almost  complete 

8.06 

.58 

Of  great  importance 

3.05 

.91 

Very  urgent 

8.00 

.90 

Feel  strongly  toward 

7.80 

1.60 

Essential 

7.58 

1.85 

Very  vit*il 

7.55 

1.05 

Certain 

7.13 

1.44 

Strongly 

7.07 

.67 

Important 

6.81 

1.14 

Good 

6.72 

1.20 

Urgent 

6.41 

1.53 

Crucial 

6.39 

1.73 

Sure 

5.93 

1.87 

Vital 

5.92 

1.63 

Moderately 

5.24 

99 

Now 

5.03 

.53 

As  at  presert 

5.00 

.50 

Fair 

4.96 

.77 

Don't  know 

4.82 

.82 

Undecided 

4.76 

1.06 

Don't  care 

4.63 

2.00 

Somewhaw 

3.79 

.94 

Indifferent 

3.70 

2.20 

Object  strongly  to 

3.50 

6.07 

Not  important 

3.09 

1.33 

Unimpcr tant 

1.94 

1.42 

Bad 

2 83 

.93 

Uncertain 

2.83 

2.50 

Doesn't  make  any  difference 

2.83 

3. 13 

Not  sure 

2.82 

1.24 

Not  certain 

2.64 

2.62 

(Table  continued  on  next  page) 


tfiiWitiitnm 


■Tin 


AM 


VIII-E  Page  21 
1 Jul  76 


Table  VIII-E-24  (Cont.) 


Descriptive  Phrases 


Phrase 

Scale 

Value 

Inter- 

quartile 

Range3 

Non-essential 

2.58 

1.67 

Doesn't  mean  anything 

2.50 

2.71 

Insignificant 

2.12 

1.14 

Very  little 

2.08 

.64 

Almost  none 

2.04 

.57 

Very  unimportant 

1.75 

1.25 

Only  as  a last  resort 

1.70 

7.30 

Very  bad 

1.50 

1.13 

None 

1.11 

.59 

From;  Dodd  and  Gerberick  (1960).  See  Section  VI1I-A  3. 


a 


Minimum  =0,5. 


VII I-F  Page  1 
1 Jul  76 


F.  Sample  Sets  of  Response  Alternatives 

It  is  sometimes  valuable  and  is  a time  saver  to  have  lists  of 
response  alternatives  available  to  use.  The  tables  in  this 
section  give  some  examples  of  response  alternatives  that  have 
been  selected  on  different  bases.  These  sets  do  not  exhaust 
all  possibilities. 

The  sets  of  response  alternatives  that  appear  in  Table  VIII-F-1 
were  selected  so  that  the  phrases  in  each  set  would  have  means  at 
least  one  standard  deviation  away  from  each  other  and  have  parallel 
wording.  Some  of  the  sets  of  response  alternatives  have  extreme 
end  points,  some  do  not.  The  sets  of  response  alternatives  shown 
in  Table  VIII-F-2  were  selected  so  that  the  phrases  in  each  set 
would  be  as  nearly  equally  distant  from  each  other  as  possible 
without  regard  to  parallel  wording.  Table  VIII-F-3  contains  sets 
of  response  alternatives  selected  from  lists  of  descriptors  with 
only  scale  values  given.  The  phrases  were  selected  on  the  bases 
of  equal  appearing  intervals.  Table  VIII-F-4  has  sets  of  response 
alternatives  selected  from  order  of  merit  lists  of  descriptors. 


VIII-F  Page  2 
1 Jul  76 


Table  VIII-F-1 


Sets  of  Response  Alternatives  Selected  so  Phrases  Are  at  Least 
One  Standard  Deviation  Apart  and  Have  Parallel  Wording 


Set 

Se  t 

No. 

Response  Alternatives 

No. 

Response  Alternatives 

1. 

Completely  acceptable 

7. 

Very  adequate 

Reasonably  acceptable 

Slightly  adequate 

Barely  acceptable 

Borderline 

Borderline 

Slightly  inadequate 

Barely  unacceptable 
Reasonably  unacceptable 

Very  inadequate 

Completely  unacceptable 

8. 

Highly  adequate 
Mildly  adequate 

2. 

Wholly  acceptable 

Borderline 

Largely  acceptable 

Mildly  inadequate 

Borderline 

Highly  inadequate 

Largely  unacceptable 
Wholly  unacceptable 

9. 

Decidedly  agree 
Substantially  agree 

3. 

Largely  acceptable 

Slightly  agree 

Barely  acceptable 

Slightly  disagree 

Borderline 

Substantially  disagree 

Barely  unacceptable 

Decidedly  disagree 

Largely  unacceptable 

10. 

Moderately  agree 

4. 

Reasonably  acceptable 

Perhaps  agree 

Slightly  acceptable 

Neutral 

Borderline 

Perhaps  disagree 

Slightly  unacceptable 
Reasonably  unacceptable 

Moderately  disagree 

11. 

Undoubtedly  best 

5. 

Totally  adequate 

Conspicuously  better 

Very  adequate 

Moderately  better 

Barely  adequate 

Alike 

Borderline 

Moderately  worse 

Barely  inadequate 

Conspicuously  worse 

Very  inadequate 
Totally  inadequate 

Undoubtedly  worst 

12. 

Moderately  better 

6. 

Completely  adequate 

Barely  better 

Considerably  adequate 

The  same 

Borderline 

Barely  worse 

Considerably  inadequate 
Completely  inadequate 

Moderately  worse 

(Table  continued  on 

next 

page) 

VIII-F  Page  3 
1 Jul  76 


Table  VIII-F-1  (Cont.) 

Secs  of  Response  Alternatives  Selected  so  Phrases  Are  at  Least 
One  Standard  Deviation  Apart  and  Have  Parallel  Wording 


Set 


Nc>  Response  Alternatives 


Set 


jj0  Response  Alternatives 


Extremely  good 

16. 

Like  extremely 

Remarkably  good 

Like  moderately 

Good 

Neutral 

So-so 

Dislike  moderately 

Poor 

Dislike  extremely 

Remarkably  poor 
Extremely  poor 

17. 

Strongly  like 

Exceptionally  good 

Like 

Neutral 

Reasonably  good 

Don't  like 

So-so 

Strongly  dislike 

Reasonably  poor 
Exceptionally  poor 

18. 

Very  much  mote 

Very  important 

A good  deal  more 
A littie  more 

Important 

A little  less 

Not  important 

A good  deal  less 

Very  unimportant 

Very  much  less 

VIII-F  Page  4 
1 Jul  76 


Table  VIII-F-2 


Sets  of  Response  Alternatives  Selected  so  That 
Intervals  Between  Phrases  Are  as  Nearly  Equal  as  Possible 


Set 

Response  Alternatives 

Set 

No. 

No. 

Response  Alternatives 

i 

Completely  acceptable 

7. 

Perfect  in  every  respect 

Reasonably  acceptable 

Very  good 

Borderline 

Good 

Moderately  unacceptable 

Could  use  some  minor  changes 

Extremely  unacceptable 

Not  very  good 
Better  than  nothing 

2. 

Totally  adequate 

Extremely  poor 

Pretty  adequate 
Borderline 

8. 

Excellent 

Pretty  incdequate 

Good 

Extremely  inadequate 

Only  fair 
Poor 

3. 

Highly  adequate 
Rather  adequate 

Terrible 

Borderline 

9. 

Extremely  good 

Somewhat  inadequate 

Quite  good 

Decidedly  inadequate 

So-so 

Slightly  poor 

4. 

Quite  agree 
Moderately  agree 

Extremely  poor 

Perhaps  agree 

10. 

Remarkably  good 

Perhaps  disagree 

Moderately  good 

Moderately  disagree 

So-so 

Substantially  disagree 

Not  very  good 
Unusually  poor 

5. 

Undoubtedly  best 
Moderately  better 

11 . 

Without  hesitation 

Borderline 

With  little  hesitation 

Noticeably  worse 

With  some  hesitation 

Undoubtedly  worst 

With  great  hesitation 

6. 

Fantastic 

12. 

Strongly  like 

Delightful 

Like  quite  a bit 

Nice 

Like 

Mediocre 

ileutral 

Unpleasant 

Mildly  dislike 

Horrible 

Dislike  very  much 
Dislike  extremely 

(Table  continued  on  next  page) 


VIII-F  Page  5 
1 Jul  76 


Table  VIII-F-2  (Cont.) 


Sets  of  Response  Alternatives  Selected  so  Tha^ 
Intervals  Between  Phrases  Are  as  Nearly  Equal  as  Possible 


Set 

Set 

No. 

Response  Alternatives 

No. 

Response  Alternatives 

13. 

Like  quite  a bit 

15. 

Very  much  more 

Like 

A little  more 

Like  slightly 

Slightly  less 

Borderline 

Very  much  less 

Dislike  slightly 
Dislike  moderately 
Don't  like 

14. 

Like  quite  a bit 
Like  fairly  well 
Borderline 

Dislike  moderately 
Dislike  very  much 

VIII-F  Page:  6 
1 Jul  76 


Table  VIII-F-3 


Sets  of  Response  Alternatives  Selected 
from  Lists  Giving  Scale  Values  Only 


Set 

No. 

Response  Alternatives 

Set 

No. 

Response  Alternatives 

1. 

Very,  very  agreeable 

6. 

Extremely  nice 

& 

Usually  agreeable 

Decidedly  nice 

Quite  agreeable 

Nice 

Agreeable 

Slightly  nice 

2. 

Rather  average 

7. 

Ordinary 

Quite  average 

Slightly  ordinary 

Unusually  average 
Decidedly  average 

Unusually  ordinary 

8. 

Extremely  pleasant 

3- 

Very,  very  desirable 

Decidedly  pleasant 

Ccjipletely  desirable 
Very  desirable 

Somewrat  pleasant 

Desirable 

9. 

Poor 

Very  poor 

4. 

Extremely  good 

Very,  very  poor 

Somewhat  good 
Slightly  bad 

10. 

Very,  very  agreeable 

Extremely  bad 

Extremely  agreeable 
Very  agreeable 

5. 

Slightly  inferior 

Quite  agreeable 

Rather  inferior 

Agreeable 

Unusually  inferior 
Extremely  inferior 

No te.  Selected  so  that  intervals  between  phrases  are  aj  equal 
as  possible. 


VI1I-F  Page  7 


Table  VIII-r-4 

Sets  of  Response  Alternatives  Selected 
Using  Order  of  Merit  Lists  of  Descriptor  Terms 

Set  _ 

No.  Response  Alternatives 


1-  Very  good 
Good 

Borderline 

Poor 

Very  poor 

2.  Very  satisfactory 
Satisfactory 
Borderline 
Unsatisfactory 
Very  unsatisfactory 

3.  Very  superior 
Superior 
Borderline 
Poor 

Very  ->oor 

4.  Extreiely  useful 

Of  considerable  use 
Of  use 

Not  very  useful 
Of  no  use 


jhsis-j* 


.... 


IX -A  Page  1 
1 Jul  76 


Chapter  IX:  Physical  Characteristics  of  Questionnaires 

A.  Overview 


This  chapter  considers  four  topics  related  to  the  physical 
characteristics  of  questionnaires:  the  location  of  response 

alternatives  relative  to  the  stem  (Section  IX-B) ; question- 
naire length  (Section  IX-C) ; questionnaire  format  consider- 
ations (Section  IX-D) ; and  the  use  of  answer  sheets 
(Section  IX-E) . 


IX-B  Page  1 
1 Jul  76 


B.  Location  of  Response  Alternatives  Relative  to  the  Stem 

ilesearch  to  determine  what  effect  the  location  of  response 
alternatives  relative  to  the  question  stem  lias  on  subjects' 
responses  is  practically  nonexistent.  Thera  is  some  evidence, 
however,  that  untrained  rater3  can  make  relatively  error-free 
graphic  ratings  regardless  of  whether  the  "good"  end  of  the 
scale  is  at  the  left,  right,  top,  or  bottom. 

In  designing  a specific  questionnaire,  the  following  points 
should  be  considered  regarding  the  location  of  response  alter- 
natives relative  to  the  stem: 

1.  With  multiple  choice  items,  the  response  alternatives  are 
usually  arranged  vertically  under  the  jtem  as  shown  in 
Section  IV-C  2.  With  a large  number  or  response  alternatives, 
two  or  more  columns  of  vertically  arranged  alternatives  might 
be  used.  Sometimes,  if  there  are  only  two  or  three  alter- 
natives (such  as  "Yes"  and  "No"),  they  are  placed  horizontally 
rather  than  vertically. 

2.  Graphic  rating  scales  are  usually  placed  horizontally  on  a 
page.  However,  the  descriptive  words,  phrases,  or  sentences 
on  a scale  should  be  concentrated  as  much  as  possible  at 
specific  points  on  the  scale.  This  is  usually  easier  if  the 
scales  are  placed  vertically  on  the  page,  but  it  can  be  done 
either  way.  Descrintors  need  not  be  equally  spaced  along 
graphic  scales,  and  should  not  be  if  there  is  reason  to 
believe,  the  psychological  distances  between  them  aie  not 
equal. 

3.  V'.th  nongraphic  (or  "numerical1  ) rating  scale  items  and  with 
ranking  and  forced  choice  items,  the  response  alternatives 
are  usually  placed  vertically  under  the  question  stem.  See 
examples  in  Chapter  TV,  Sometimes  rating  scale  items  are 
placed  horizontally  under  the  stem  as  shem  in  Section  Vll-B, 

If  a number  of  rating  scale  items  all  use  the  same  response 
alternatives,  the  question  stems  can  be  presented  in  a 
column  with  the  response  alternatives  to  the  right  as  showr. 
in  Figure  IX-B-1. 

In  Figure  IX-B-1  the  response  alternatives  have  been  rotated 
90  degrees  to  save  space.  An  effort  should  be  made  to  plac- 
the  response  alternative  horizontal  with  the  bottom  of  the 
page  so  that  the  respondent  does  not  need  to  turn  the  page 
sideways  to  read  them. 

4.  The  response  alternatives  for  semantic  differential  items  are 
usually  placed  horizontally  on  the  page.  For  an  example, 
see  Section  IV-H. 


Figure  IX-B-1 


Arrangement  of  Items  With  Same 
Rating  Scale  Response  Alternatives 

1.  How  satisfied  or  dissatisfied  are  you  with  each  of  the  following 
factors  or  things? 


a. 


b. 


c. 


| d. 

I 

e. 

f. 


g. 


Type  of  furniture  in  barracks. 
Medical  service  to  soldiers. 
Quality  of  ness  hall  food. 
Leadership  of  generals. 
Opportunity  for  promotion. 

Army  pay. 

Civilian  .pinion  of  Army. 


T3 

TJ 

0) 

0) 

4) 

•r ' 

•H 

T3 

T3 

c 

4-i 

4-4 

V 

4) 

CO 

CO 

•H 

•H 

rH 

•H 

4-1 

<4-4 

U 

jj 

4-J 

CO 

CO 

4) 

id 

CO 

•H 

T3 

CO 

>»  CO 

u u 

4J 

V4 

CO 

v-  co 

J>  flJ 

o 

•H 

0)  *H 

> Cfl 

C/3 

CQ 

Q 

■ > O 

E 


i 


! 


i 


c. 


Questionnaire  Length 
1.  General 


IX-C  Page  1 
1 Jul  76 


The  length  of  questionnaires  used  in  field  tests  has  ranged 
from  one  page  to  as  many  as  pages,  perhaps  more.  How  long 
can  one  expect  a respondent  to  work  effectively  at  the  ques- 
tionnaire-answering task?  At  what  point  does  attention  and 
motivation  start  to  degrade,  thereby  producing  poorly  con- 
sidered responses  or  the  omission  oi  responses?  Research 
information  on  this  point  is  not  available  to  provide  a basis 
for  a firm  recommendation.  There  is  even  disagreement  on  the 
effect  of  astionnaire  length  on  the  response  rate  to  mailed 
questionnaires.  However,  questionnaires  which  require  longer 
than  one  hour  to  complete  will,  in  most  situations,  cause 
bordom  and  indifference.  Even  10  or  15  minutes  may  be  too 
long,  if  the  questionnaire  is  perceived  by  the  respondent  as 
redundant  or  asking  unnecessary  questions.  If  one  is  con- 
cerned over  the  effects  of  a long  questionnaire,  .alternate 
forms  should  be  used,  wherein  the  order  of  items  is  re  -ersed 
(or  approximately  so).  For  example,  the  items  answered  last 
on  50'f  of  the  forms  would  he  answered  first  on  the  other  50 
of  the  forms.  One  could  also  split  the  respondent  group  in 
half  and  give  half  cf  the  questions  to  each  group — provided 
that  the  two  groups  were  fairly  equivalent  In  relevant 
characteristics.  It  is  assumed  that  everything  else  would 
already  have  been  done  to  reduce  the  number  of  items  before 
one  of  these  approaches  is  used. 

2 . Results  of  a Recent  Study 

In  a 1 9~ 6 study,  ARI  assisted  TCATA  in  obtaining  and  analysing 
questionnaire  responses  from  a croup  of  trainees  whose  duration 
and  location  of  basic  and  advanced  individual  training  was 
handled  o'fferentiy  from  the  usual.  The  number  of  trainees 
answering  items  1-7  and  48-54  of  a 54  item  quest ionnai "e  is 
shown  oeluw.  Note  that  there  is  very  little  drop  in  t x»  number 
of  men  in  either  group  as  we  skin  from  items  1-7  to  : tons  '*8-5.'. 
This  suggests  that  a 50  iter,  questionnaire,  administered  as  this 
was,  was  not  so  long  that  persons  stopned  responding  after 
answering  successively  more  questions. 

Now  note  the  sharp  drop-about  1 5 and  0’  t or  the  two 
groups-in  responses  to  items  51  rnd  54.  A more  gradual  deore tse 
in  number  of  people  responding  is  more  what  one  would  exje.'t  ii 
they  are  being  "worn  down"  or  fatigued  bv  excessive  length. 


IX- C Page  2 
1 Jul  76 


This  result  was  puzzling,  but  then  it  was  noted  that  items 
53  and  54  are  alone  together  on  the  tenth  and  final  page 
of  the  questionnaire.  It  is  speculated  that  many/rao3t  of 
those  not  answering  items  53  and  54  turned  page  10  over 
along  with  page  9 and  thought  they  had  answered  all  that 
was  required  of  them.  No  one  checked  their  questionnaires 
when  they  we.o  handed  in  to  see  if  they  had  left  any  items 
blank.  The  reductions  in  respondents  appears  more  of  a 
"last  page  phenomona"  than  a consequence  of  an  excessively 
long  questionnaire. 

item  £ Experimental  Group  Control  Jroup 


1 

716 

512 

2 

716 

513 

3 

717 

511 

4 

714 

513 

5 

716 

514 

6 

713 

310 

7 

716 

« 

511 

• 

48 

• 

707 

• 

509 

49 

707 

508 

50 

707 

508 

51 

707 

510 

52 

698 

505 

53 

593 

462 

54 

604 

461 

IX-D  Page  1 
1 Jul  76 


D.  Questi.  naire  Format  Considerations 

This  section  addresses  the  format  of  questionnaire  items,  title 
and  other  identification  marks,  printed  introductions,  planning 
to  facilitate  processing,  and  other  questionnaire  format 
considerations . 

1.  Format  of  Questionnaire  Items  and  Format  Bias 

Item  format  biases  occur  when  responses  to  items  (questions) 

are  influenced  by  the  question  stem  or  response  alternatives. 

The  following  guidance  is  provided: 

a.  The  format  of  all  questionnaire  items  on  a questionnaire 
should  be  consistent  whenever  possible.  Mixing  multiple 
choice  questions,  open  ended  questions,  scales,  etc.,  is 
normally  not  desirable. 

b.  Punctuation  and  question  structure  should  be  consistent 
and  in  accordance  with  proper  sentence  structure 
principles.  Where  incomplete  sentences  (e.g.,  "The 
training  that  I have  received  at  Fort  Hoed  has  been" 
with  five  response  alternatives  of  "very  challenging" 
through  "very  unchallenging")  are  used  as  stems  no 
extraneous  punctuation,  such  as  a colon,  need  be  put  at 
the  end  of  the  stem.  The  first  word  of  the  response 
alternatives  should  net  be  capitalized  unless  they 
would  be  if  the  statement  were  written  as  a continuous 
sentence.  Terminal  punctuation  at  the  end  of  the 
response  alternatives  should  follow  the  same  general 
rule  of  consistency  with  normal  sentence  structure. 

Hence,  a period  would  ^ruinarilv  be  placed  after  each 
response  alternative. 

When  an  item  consists  of  a complete  question 
"How  satisfied  or  dissatisfied  are  von  with  the  t r.  i t ure 
in  the  barracks?")  :.ie  First  wort'  oi  the  response  alier- 
natives  should  he  capitalized  since  thev  Jo  not  continue 
a sentence.  If  the  response  alternatives  constitute 
complete  sentences , then  thev  should  haze  periods  at  tie 
end,  or  whatever  other  terminal  punctuation  is  appro- 
priate. Sometimes  pe, iods  ..re  placed  at  the  end  ot 
extremely  long  response  alternatives  even  if  they  arc 
not  sentences.  Ordinarily,  then,  with  this  lorm  of 
items,  periods  would  not  be  placed  alter  the  response 
aifernal ives . 


IX-D  Page  2 
1 Jul  76 


Exceptions  to  the  above  suggestions  should  be  made 
whenever  the  exception  would  improve  clarity.  An 
example  might  be  when  periods  would  be  confused  with 
decimal  points. 

c.  When  items  are  ambiguous,  a recognizable  pattern  of 
responses  is  often  produced. 

d.  Item  format  bias  may  be  a function  of  how  items  are 
sequenced  and  grouped. 

e.  Some  authors  conclude  that  a bias  can  be  expected  from 
all  closed-ended  questions  where  answers  must  be 
selected  from  two  or  more  fixed  choices. 

f.  The  paired  comparison  format  may  be  useful  for  those 
respondents  who  tend  to  check  many  items  from  a list, 
and  for  those  who  check  only  a few. 

g.  Card  sorting  may  show  rhe  least  item  format  bias. 

h.  WiLh  two-way  choices,  some  respondents  have  a tendency 
to  select  the  first  alternative.  Others  have  a tendency 
to  select  the  second.  With  other  multiple  choice  items, 
some  respondents  have  a tendency  to  select  certain 
categories. 

i.  There  is  a little  evidence  that  the  first  alternative 
for  an  item  is  chosen  somewhat  more  frequently  than  the 
others. 

2.  Title  and  Other  Identification  Marks 


Each  questionnaire  should  carrv  a descriptive  title  centered 
at  the  top  of  the  first  page  of  questions  and  on  the 
instructional  and/or  introductory  cover  page  if  such  is 
used.  Each  questionna ire  form  should  ;.lsc  be  designated 
by  form  number  to  distinguish  it  from  other  forms.  This 
number  usually  goes  ir.  the  upper  left  hand  corner  of  each 
page. 

3.  Printed  Introductions 

Introduct  ons  are  sometimes  printed  .it  the  start,  of  a 
questionnaire  to  tell  respondents  the  purpose  ami  importance 
of  the  questionnaire,  and  the  importance  of  their  cooperation 
in  answering  all  questions  c.irefullv.  Methodological  researc 
is  needed  to  determine  the  effectiveness  of  such  introduc- 
tions, but  if  they  are  too  lengthy  there  is  a I wavs  the  pos- 
sibility that  thev  might  ho  count eroroduct Ive.  Regardless, 


IX-D  Page  3 
i Jul  76 


if  the  introduction  is  g.  ’ ng  to  run  more  than  a quarter  of 
a page,  it  might  better  be  placed  on  a cover  sheet. 

See  Section  X-B  about  questionnaire  instructions. 

4.  Planning  to  Facilitate  Processing 

Where  possible,  questionnaires  should  be  planned  to 
facilitate  data  collection,  processing,  and  analyses. 

This  frequently  involves  formulating  the  questionnaire 
for  machine  processing.  For  small  samples,  however,  manual 
processing  should  normally  be  employed  since  the  effort 
needed  to  plan  for  machine  processing  is  not  justified  by 
anticipated  data  reduction  time  savings.  How  to  format 
a questionnaire  for  machine  processing  is  outside  the 
current  scope  of  this  manual.  See  Section  TX-E  regarding 
the  use  of  answer  sheets. 

5.  Other  Questionnaire  Format  Considerations 

a.  If  the  respondent's  name,  rank,  etc.,  is  really  needed, 
ask  for  it  on  the  front  page.  (See.  also  Section  X-C.) 
Sometimes  other  information  is  needed  about  a respondent 
so  that  it  can  be  correlated  with  his  responses.  This 
may  include  duty  MOS,  special  army  training,  combat 
experience,  etc.  If  it  is  really  needed,  it  is  usually 
asked  for  on  the  front  page  along  with  name. 

b.  If  a quesr lonnaire  has  ovt"  two  pages,  numeric  page 
numbers  should  be  used.  they  are  ordinarily  put  at  the 
center  bottom  of  each  page. 

c.  A questionnaire  should  not  be  crowded  or  cluttered  in 
appearance.  If  It  is,  certain  items  might  be  missed. 

d.  Each  item  in  a questionnaire  should  be  numbered  cr 
lettered  so  it  can  be  identified  and  referred  t ■. 

e.  Sufficient  room  should  he  left  lui  the  respondent  to  write 
in  his  answers  to  open-ended  questions. 

f.  Directions  should  be  well  displayed  anJ  unmistakable 
clear. 

g.  It  is  usually  perferable  to  print  the  questionnaire  in 
booklet  form  on  both  sides  of  the  pane,  rattier  than  have 
It  duplicated  on  one  side  on  the  page  and  coiner-st  ipled. 


IX- D Page 
1 Jul  76 


h.  There  is  research  evidence  that  an  attractive 
questionnaire  increases  response  rates. 

i.  Different  colored  pages  or  questionnaire  forms  may 
aid  in  the  sorting  of  data  and  may  have  appeal  to  the 
respondents. 


1“ 


IX-E  Page  1 
1 Jul  76 

E.  Use  of  Answer  Sheets 

As  noted  in  Section  IX-D  4,  when  possible,  questionnaires  should 
be  designed  to  facilitate  data  collection,  processing,  and 
analyses.  Hence,  if  the  number  of  questions  warrant  it,  consider- 
ation should  be  given  to  the  use  of  separate  answer  sheets.  An 
answer  sheet  can  be  designed  for  either  hand  or  machine  processing. 
A number  of  standard  machine  processable  answer  sheets  are  avail- 
able, and  copies  will  be  included  in  a subsequent  updating  of 
this  manual. 

When  considering  the  possible  use  of  answer  sheets,  the 
following  points  should  be  kept  in  mind: 

1.  The  use  of  a separate  answer  sheet  may  require  a different 
sot  of  abilities  than  responding  on  the  questionnaire  itself. 

2.  Depending  upon  their  prior  experiences  with  them,  respondents 
may  find  it  more  difficult  to  use  a separate  answer  sheet 
than  to  respond  on  the  questionnaire  sheet. 

3.  It  is  normally  more  difficult  and  time  consuming  for  the 
respondent  to  use  a separate  answer  sheet.  (However, 
separate  answer  sheets  have  been  used  successfully  for  some 
purposes  with  fourth  grade  children). 

4.  When  separate  answer  sheets  are  employed,  the  questionnaire 
booklets  are  reusable. 

5.  Respondents  sometimes  err  in  using  the  last  spaces  on  i 
multiple  choice  'nswer  sheet  when  there  are  more  spaces  than 
response  alterrn  /es.  This  can  be  avoided  by  the  use  of 
tailor-made  sheets. 


I 


X-A  Page  1 
1 Jul  76 


Chapter  X:  Considerations  Related  to  Questionnaire  Administration 

A.  Overview 


Considerations  related  to  the  administration  of  questionnaires  are 
discussed  in  this  chapter,  since  such  matters  are  obviously  of 
concern  when  questionnaires  are  constructed.  Questionnaire 
instructions  are  discussed  in  Section  X-B,  anonymity  for  respondents 
in  Section  X-C,  motivational  factors  related  to  questionnaire 
administration  in  Section  X-D.  Administration  time,  characteristics 
of  administrators,  and  administrative  conditions  are  the  topics  of 
Section  X-E,  X-F,  and  X-G,  respectively.  The  training  of  raters  and 
other  evaluators  is  the  concern  of  Section  X-H,  while  other  factors 
related  to  questionnaire  administration  are  considered  in  Section  X-I. 


Instructions 


X-B  Page  1 
1 Jul  76 


B. 


Care  oust  be  exercised  in  preparing  instructions  for  questionnaires 
since  they  are  quite  likely  to  affect  the  way  the  respondent  answers 
the  questions.  For  example,  even  mildly  anger  arousing  printed 
instructions  may  elicit  responses  of  negativism. 

Although  further  research  is  needed  to  fully  determine  the 
influence  of  instructions  on  responses,  some  practical  guidelines 
can  be  offered: 

1.  It  is  sometimes  preferred  that  an  oral  statement  of  question- 
maire  purpose  be  given  to  respondents.  If  this  is  not 
practical  or  a person  with  appropriate  credibility  and/or 
status  cannot  be  supplied  to  make  the  statements,  then  a 
printed  statement  must  suffice.  (See  Section  IX-D  3 regarding 
printed  introductions.) 

2.  Lengthy  instructions  for  completing  questionnaires  should  be 
avoided.  They  may  tend  to  confuse  the  respondent  rather 
than  help  him. 

3.  The  option  of  orally  presenting  instructions  is  often  avail- 
able. When  oral  instructions  are  given  they  are  usually 
given  just  prior  to  administering  the  questionnaire. 

4.  If  instructions  are  given  orally  and  an  illustration  is  needed, 
a visual  display  should  be  available  which  may  include  a 
printed  version  of  more  complex  instructions. 

5.  When  questionnaires  are  group  administered,  it  should  be 
announced  that  aides  will  check  each  respondent's  question- 
niare  for  completeness,  if  such  a process  can  be  implemented. 

6.  "Cute"  examples  on  instructions  should  not  be  used.  They  will 
damage  rapport  and  detract  1 rom  the  seriousness  of  the  question- 
naires, particularly  for  more  mature  and  older  respondents.  It 
is  best  tc  use  a neutral  example  that  will  be  suitable  for  all 
respondents. 

7.  Obviously,  instructions  should  be  given  in  a way  that  all 
respondents  can  understand  them.  Care  should  he  exercised 
about  the  level  of  vocabularv  used. 

An  example  is  given  on  the  following  page  of  the  instructions 
that  might  precede  the  items  of  a questionnaire.  In  this 
example  the  responses  were  to  be  given  on  a separate  "answer" 
or  response  sheet. 


X-B  Page  2 
1 Jul  76 


TRAINING  ATTITUDE  QUESTIONNAIRE  (BASIC  AND  AIT) 

INSTRUCTIONS ; The  purpose  of  this  questionnaire  is  to  obtain  information 
from  you  regarding  training,  working  and  living  while  in  the  Army's  Basic 
Training  and  Advanced  Individual  Training  (AIT)  program.  Your  answers 
will  help  the  Army  to  determine  what  conditions  are  in  need  of  improve- 
ment, and  will  assist  the  Army  in  determining  the  actions  they  must  take 
to  improve  training  and  the  quality  of  life  for  new  soldiers  in  the  Army. 
Your  honest  opinions  are,  therefore,  essential. 

We  have  no  need  to  know  who  you  are  personally.  No  effort  will  be  made 
to  identify  either  you  or  your  unit.  DO  NOT  WRITE  YOUR  NAME,  SOCIAL 
SECURITY  NUMBER,  OR  UNIT  on  either  the  questionnaire  or  the  answer  sheet. 

Each  question  should  be  answered  by  circling  the  letter  on  your  answer 
sheet  which  is  next  to  the  answer  which  best  describes  your  feelings. 

See  sample  question  below: 

SAMPLE  QUESTION:  3.  How  old  are  you? 

a.  17 

b.  18 

c.  19 

d.  20 

e.  21  or  older 

If  you  are  19  years  old,  you  should  circle  the  letter  c on  your  answer 
sheet  for  question  3,  as  has  been  done  below,  since  the  letter  c 
corresponds  to  your  correct  age  of  19  on  the  questionnaire. 


' QUESTION 
NUMBER 

RESPONSES 
(CIRCLE  ONE)  | 

01 

- -i 

a 

b 

c d e ! 

02 

a 

B 

mm 

03 

■a 

H 

■M  " ■ 

04 

a 

b 

c d e 

If  you  have  any  questions,  please  ask  the  questionnaire  administrator 
for  assistance.  You  will  have  30  minutes  to  complete  the  questionnaire. 
We  will  all  turn  in  our  answer  sheets  and  leave  at  the  same  time.  Do 
not  turn  the  page  and  start  to  work  until  instructed  to  do  so. 


X-C  Page  1 
1 Jul  76 

C.  Anonymity  for  Respondents 

1.  Factors  to  be  Considered 

There  are  several  factors  to  be  considered  when  deciding 
whether  to  require  the  respondent's  name  or  other  identify- 
ing information  on  a questionnaire.  Some  of  the  factors  are 
supported  by  research,  while  others  are  not. 

a.  If  the  respondent  supplied  his  name,  he  is  aware  that  he 
can  be  identified  and  called  back.  If  respondents  do  not 
have  to  give  their  names  cr  similar  information,  most 
will  believe  that  they  cannot  be  identified  and  called 
back  for  any  type  of  accounting  after  their  question- 
naires have  been  collected. 

b.  The  perception  of  anonymity  seems  to  depend  not  only 
upon  whether  a respondent  gives  his  name,  but  also 
on  the  conditions  under  which  the  questionnaires  are 
administered.  For  example,  paper-and-pencil  question- 
naires are  more  anonymous  than  structured  interviews. 

c.  The  effects  of  anonymity  seem  to  be  related  to  the 
content  of  the  questionnaire.  This  is  particularly 
true  when  information  on  sensitive  areas  is  collected. 

For  general  attitudes,  it  may  not  matter. 

d.  The  effects  of  anonymity  may  also  depend  upon  who 
administers  the  questionnaire,  and  the  circumstances 
under  which  it  is  administered.  Responses  may  be 
distorted  when  respondents  are  identified  and  under 
high  threat. 

e.  Respondents  may  be  more  lenient  when  rating  other 
personnel  if  they  think  they  will  be  identified. 

2.  Implications  of  the-  Privacy  Act  of  1974 

If  the  experimenter,  test  officer,  or  questionnaire  writer 
desires  to  obtain  certain  types  of  personal  information 
from  a respondent,  the  federal  Privacy  Act  of  1974,  in 
turn,  requires  that  certain  information  first  be  given  to 
the  candidate  respondent.  One  may  use  OA  Form  4368-F, 

1 Hay  75  for  the  purpose  of  communicating  this  information 
to  the  respondent.  The  form  is  shown  filled  out  on  page 
X-C  3.  In  this  particular  example  the  research  questions 
dealt  with  attitudes  toward  tneir  treatment  in  the  Army. 


X-C  Page  2 
1 Jul  76 


A second  example,  Figure  X-C-l,  Illustrates  a .acre 
compact  format.  The  same  elements  of  information  called 

,or  b-v  DA  K,r®  4368-R  have  been  communicated;  it's  lust 
tiiat  that  form  was  not  used.  J 

A privacy  act  statement  is  nos  necessarily  required 
as  a part  of  all  questionnaires  that  are  administered  to 
rmy  personnel.  It  is  not  necessary  where  no  personal 
-rt format ion  is  being  requested,  and  where  the  individual 
does  not  have  to  identify  himself  by  name,  SSAN,  or  othei 
marx  or  characteristics.  For  example,  no  invasion  ot 
privacy  is  involved  where  soldiers  are  asked  to  anony- 
mously evaluate  some  new/ revised  weapon,  equipment, 
organ  teat ion  regarding  effectiveness  and/or  acceptability. 


X-C  Page  3 
1 Jul  76 


3 ROUTINE  USES 

This  is  an  experimental  personnel  data  collection  form  developed  by  the 
U.f.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences  pursuant 
to  its  research  mission  as  prescribed  in  AR  70-1.  When  identifier  (name  or 
Social  Security  Number)  are  requested  they  are  to  be  used  for  administrative 
and  statistical  control  purposes  only.  Full  confidentiality  of  the 
responses  will  be  maintained  in  the  processing  of  these  data. 


4 MANDATORY  OR  VOLUNTARY  DISCLOSURE  ANO  EE  SECT  ON  INDIVIDUAL  NOT  PROVIDING  INFORMATION 

Your  participation  in  this  research  is  strictly  voluntary.  Individuals  are 
encouraged  to  provide  complete  and  accurate  information  in  the  interests  of 
the  research,  but  there  will  be  no  effect  on  individuals  for  not  providing 
all  or  any  part  of  the  information.  This  notice  may  be  detached  from  the 
rest  of  the  form  and  retained  by  the  individual  *f  so  desired. 


FORM  Penney  Act  Stitwmi.t  26  Sap  7S 

DA  Form  4368- R.  1 May  76 


1 


[ 

k 


X-C  Page  4 
1 Jul  76 


Figure  X-C-l 

A Second  Example  of  a Privacy  Act  Statement 
11B/C  GRADUATE  FIELD  SURVEY 

(Proscribing  Directive:  AR  600-46;  TRADOG  Ltr  dtd  29  Aug  7i) 


INFORMATION  PRIVACY  ACT  STATEMENT 

1.  Authority:  5 USC  301,  10  USC  3012,  Authority  for  the  Secretary 
of  the  Army  to  Issue  AR's;  44  USC  3101,  Authority  for  Collecting 
Necessary  Data. 

2.  Principal  Purpose:  To  collect  data  to  evaluate  the  effectiveness 

of  individual  training  received  prior  to  joining  one's  initial 
unit  of  assignment. 

3.  Routine  Uses:  The  data  collected  with  this  form  are  to  be  used 

for  research  purposes  only.  They  will  not  become  a part  of  any 
individual's  record  and  will  not  be  used  in  whole  or  in  part  in 
making  any  determination  about  an  individual. 

The  identifiers  (name  or  Social  Security  Number)  are  to  be  used 
for  administrative  and  statistical  control  purposes  only.  Full 
confidentiality  of  responses  will  be  maintained  in  the  processing 
of  these  data. 

4.  Mandatory  or  Voluntary  Disclosure  and  Effect  on  Individual  No~ 

Providing  Information:  Voluntary  - Your  participation  in  this 

research  is  strictly  voluntary.  Individuals  are  encouraged  to 
provide  complete  and  accurate  information  in  the  interests  of 
the  research,  but  there  will  be  no  effect  on  individuals  not 
providing  all  or  any  part  o'  the  information. 


This  notice  may  be  detached  from  the  rest  of  this  form  and 
retained  by  th.‘  individual  answering  the  questionnaire  if  so 
desired. 


D.  Motivational  Factors 


X-D  Page  1 
1 Jul  76 


This  section  considers  the  efrects  of  lack  of  motivation,  and 
some  ways  of  providing  a desirable  level  of  motivation  to 
respondents  during  the  questionnaire  administration  process. 

1 . Effects  of  Lack  of  Motivation 

Generally,  the  results  of  any  study  will  suffer  distortion 
if  those  to  whom  the  questionnaire  is  distributed  are  not 
sufficiently  motivated.  If  they  have  the  choice,  they  will 
not  respond  ac  all.  If  they  do  have  to  respond  or  are  juct 
minimally  motivated,  they  may  omit  items,  make  patterned  or 
random  responses,  or  just  generally  respond  poorly.  As  a 
result,  the  reliability  and  validity  of  the  responses  will 
be  decreased  and  hence  the  results  of  the  study  left  open 
to  serious  question. 

2 . Ego  Involving  Potential  Respondents  in  the  Stvdy 

There  are  a number  of  ways  that  motivation  can  be  increased 
by  ego  involving  potential  respondents.  Some  of  the  ways 
are  given  belov* : 

a.  The  special  role  cf  the  respondent  in  the  study  can  be 
emphasized. 

b.  Responsibility  can  be  stressed  when  it  is  appropriate 
to  do  so. 

c.  The  wording  of  cover  letters,  if  used,  affects  ego 
involvement.  Help  may  sometimes  be  requested  on  the 
basis  of  appealing  to  the  self  interests  of  the 
respondent.  There  is  evidence  that  this  type  of  appeal 
helps  most  with  less  educated  respondents. 

3.  Stimulating  the  Return  of  Remotely  Administered  Questionnaires 

Obviously,  whatever  egj  involves  potential  respondents  in  a 
study  also  stimulates  the  retv.’n  of  remotely  administered 
questionnaires,  such  as  those  distributed  by  mail.  Other 
ways  of  stimulating  the  return  or  response  rate  are: 

a.  Return  rates  may  often  be  vi ^nif icantlv  improved  when  a 
letter  is  seat  in  idvance  notifying  the  potential 
respondent  that  he  will  recieve  a questionnaire  and  his 
help  is  needed  in  filling  it  out. 


X-D  Page  2 
1 Jul  76 


0.  Stamped  and  addressed  return  envelopes  can  be  sent  with 
the  questionnaire.  There  is  evidence  that  this  does 
increase  response  rate. 

c.  There  is  contradictory  evidence  about  whether  short 
questionnaires  are  returned  more  frequently  than  longer 
ones,  but  one  would  intuitively  believe  it  to  be  true. 

d.  Followup  reminders  can  be  sent  to  those  who  do  not 
promptly  return  their  questionnaires.  There  is  some 
question,  however,  regarding  how  much  such  followups 
increase  response  rate.  At  times  it  may  not  be  cost 
effective,  so  maybe  the  decision  should  be  a function 
of  whether  or  not  the  initial  return  rate  was  adequate. 

4 . Use  of  Incentives 

The  evidence  has  been  equivocal  regarding  the  extent  to 
which  motivation  is  increased  through  the  use  of  incentives. 
Incentives  may  include  money,  time  off,  special  privileges, 
etc.  Generally,  however,  it  is  agreed  that  incentives 
usually  help  increase  the  response  rate  with  remotelv 
administered  questionnaires. 

5.  Other  Motivational  Factors  Related  to  Questionnaire  Administration 


Many  additional  motivational  factors  related  to  quest ionnaire 
administration  could  be  noted  or  inferred  from  other  sections 
in  this  manual.  Some  of  them  are: 

a.  Respondents  often  have  preferences  for  certain  item  formats, 
although  sometimes  such  preferences  do  not  seem  to  have  an 
effect  on  results.  Some  subjects  prefer  rating  scales  to 
forced  choice  iternc.  With  forced  choice  some  like  the 
option  of  indicating  the  degree  of  applicability  of  each 
statement.  Some  do  not  like  forced  sort  Q-oort  (See 
Section  IV-G.)  Some  prefer  multiple  category  to  two  category 
options.  These  preferences  may  relate  to  familiarity  of  the 
respondent  with  given  item  types.  There  is  not  much  that 

the  questionr.iare  designer  can  do  about  such  preferences, 
except  to  note  that  they  exist. 

b.  Motivation  may  be  increased  by  offering  feedback  of  study 
results  to  the  respondent. 

c.  Every  effort  should  bo  made  to  praise  the  respondents  ur 
potential  respondents,  to  the  extent  that  it  is  reasonable. 

d.  Long,  vague,  or  boring  questionnaire  sessions  should  be 
avoided,  since  it  will  decrease  respondent  motivation. 


X-D  Page  3 
1 Jul  76 


e.  Questionnaire  administration  sessions  should  not  be 
scheduled  when  there  are  conflicts  with  other  activities 
of  greater  interest  to  the  respondents.  Nor,  in  general, 
should  they  be  schedules  very  early  or  very  late  in  the 
day. 

f.  Volunteers  are  usually  more  motivated  to  fill  out 
questionnaires  than  are  nonvolunteers.  However,  their 
replies  may  be  more  biased. 

g.  When  respondents  are  told  that  they  may  lea/e  as  soon  as 
they  have  completed  the  questionnaire  they  usually  dc  a 
much  more  hasty  and  unsatisfactory  job  tnan  when  they 
are  given  a specific  time  for  completion,  and  are  told 
that  they  cannot  leave  until  the  time  period  is  up. 

h.  See  Chapter  XIV  about  the  behavior  of  interviewers. 


I 


X-E  Page  1 
1 Jul  76 


E.  Administration  Time 


Little  is  known  about  the  effects  of  questionnaire  administration 
time  on  respondents  motivation,  or  of  the  effects  of  setting  time 
limits  for  completing  questionnaires.  The  questionnaire  admin- 
istration period  should  generally  have  been  determined  in  advance 
by  pretesting.  Although  there  will  be  some  variability  in  the 
length  of  time  taken  to  complete  a questionnaire,  there  is 
remarkable  consistency  among  those  who  are  sincere  in  attempting 
to  do  an  accurate  and  complete  job  of  answering  all  questions. 


When  a questionnaire  is  administered  to  a group  of  respondents, 
the  instruction  should  emphasize  that  all  respondents  will  be  given 
plenty  of  time  to  answer  the  questions.  As  indicated  earlier  in 
X-D  5 g,  the  instructions  should  not  tell  the  respondents  that 
they  can  leave  as  soon  as  they  have  finished  the  questionnaire, 
since  many  will  then  cut  short  their  efforts  to  answer  the 
questions.  There  is  little  hope  of  obtaining  carefully  considered 
evaluative  responses  on  a questionnaire  if  the  respondent  knows 
that  the  faster  he  finishes  the  questionnaire  the  sooner  he  will 
be  able  to  go  home. 


Questionnaire  administration  time  is  obviously  related  to 
questionnaire  length,  which  is  the  topic  of  Section  IX-C. 


Every  attempt  shou’d  be  made  to  determine  the  maximum  time 
needed  to  complete  a given  questionnaire.  If  the  questionnaire 
is  group  administered,  the  maximum  time  for  the  slowest  respondents 
should  usually  be  used  in  scheduling  the  administration  of  the 
questionnaire. 


H 


F.  Characteristics  of  Administration 


X-F  Page  1 
1 Jul  76 


As  with  other  areas  o L this  manual,  little  has  been  established 
in  the  research  literature  about  how  the  characteristics  of 
questionnaire  administrators  affect  the  overall  process  with 
nonremotely  administered  questionnaires.  The  following  items 
may  be  noted: 

1.  In  most  cases  it  is  felt  that  the  sex  of  the  administrator 
has  no  effect  on  the  responses  received.  There  may,  however, 
be  certain  motivational  effects. 

2.  The  military  rank  of  the  administrator  may  have  an  effect  on 
the  respondent,  but  no  research  has  been  performed  to 
indicate  this. 

3.  Any  effect  that  the  race  of  the  administrator  has  on  the 
respondent  nay  be  a function  of  the  content  material  of  the 
questionnaire  e.g.,  race  would  be  expected  to  influence 
responses  on  a race  relations  questionnaire  more  than  on  a 
questionnaire  dealing  with  rifle  comparisons.  The  effects 
should  probably  be  viewed  as  the  result  of  interaction 
between  administrator  and  respondent  characteristics,  and 
the  questions  being  asked. 

4.  See  Chapter  XTV  about  the  influence  on  an  interviewer  on  the 
interviewee. 


Administration  Conditions 


Questionnaire  administration  conditions  obviously  cannot  be 
controlled  with  remotely  administered  questionniares . With 
group  administered  questionnaires,  the  following  guidance  is 
offered: 

1.  Administration  conditions  should  be  provided  which  are  most 
appropriate  to  the  particular  type  of  respondent  completing 
the  questionnaire. 

2.  Administration  conditions  have  an  effect  on  questionnaire 
responses.  For  example,  different  responses  may  be  obtained 
if  the  questionnaire  is  filled  out  in  a group  situation  on 
the  job  rather  than  individually  at  home. 

3.  When  personnel  are  being  rated,  different  ratings  may  be 
obtained  depending  on  how  acquainted  the  rater  and  ratee  are. 

4.  For  Army  field  test  evaluations,  the  circumstances  under  which 
questionnaires  must/can  be  administered  will  vary  rather  widely. 
There  may  be  times  when  no  writing  surface (s)  or  pencils  are 
available;  clipboards  and  pencils  should  be  supplied  if  this 
problem  can  be  anticipated.  If  the  needed  materials  cannot 

be  brought  to  the  respondents,  then  arrange  to  move  them  to 
a place  where  the  materials  and  other  environmental  conditions 
are  satisfactory. 

5.  Respondents  should  be  required  to  give  their  answers  without 
being  influenced  by  other  respondents.  Achieving  this  requires 
respondents  to  be  somewhat  separated  and/or  to  have  the 
administrator (s)  watching  them.  Simply  instructing  them  not 

to  consult  with  each  other  is  usually  n t sifficient. 


X-H  Pape  1 
1 Jul  76 


H.  Training  of  Field  Test  Evaluators 

An  extended  discussion  of  the  training  of  raters  and  other  test 
evaluators  is  not  undertaken  in  the  preliminary  version  of  this 
manual.  The  following  suggestions,  however,  can  be  offered  about 
the  general  training  of  the  Army  field  test  evaluators.  See 
Section  X-B  regarding  questionnaire  administration  instructions. 

1.  Impress  on  test  evaluators  that  they  are  supposed  to  answer 
the  questionnaire  based  upon  what  they  observe  in  the  test. 
Stress  the  need  for  evaluations  based  only  upon  what  was 
seen  during  the  test  exercise,  regardless  of  anv  personal 
feelings  or  knowledge  of  concepts  or  equipment  as  might  exist 
in  a true  combat  environment  (except  in  special  instances 
where  this  is  specifically  asked  for).  To  help  identify 

and  reduce  pre'udgment,  a broad  question  might  be  included 
to  permit  the  evaluator  to  express  any  bias  he  may  have. 

It  may  be  a question  such  as  "Based  on  your  personal  experience, 
do  you  feel  the  "DPST"  is  a useful  approach  to  real  daily 
problems,  i.e.,  outside  a ‘est  exercise  environment?" 

Such  a question  would  permit  the  evaluator  an  outlet  for 
preconceived  opinions  and  attitudes  which  otherwise  would 
color  his  view  of  the  events  observed  during  the  exercise. 

On  the  other  hand,  in  some  situations  the  evaluator  might 
feel  it  necessary  to  defend  this  personal  judgment  by  biasing 
his  answers  to  the  remaining  question  answers! 

2.  Stress  the  importance  of  evaluators  to  the  success  of  the 
test,  ierhaps  briefly  indicate  some  actions  which  have  been 
taken  to  implement  concepts  supported  by  evaluative  data 
from  previous  tests. 

3.  Permit  evaluators  (particularly  after  the  pilot  test)  to 
sound  off  about  the  forms  and  their  perceived  inadequacies, 
regardless  of  how  unreasonable  these  complaints  might  be. 

The  goal  is  to  have  all  evaluators  answering  quest ionna I res 
understand  that  they  are  active  c ntribucors  rathe;  t linn 
just  a means  to  an  end. 

4.  Constantly  examine  completed  questionnaires  to  insure  tiiat 
they  have  been  filled  out  and  understood.  This  procedure 
should  continue  throughout  the  entire  series  of  tests. 

5.  Stress  the  notion  that  complete  honestv  and  objectivity  is 
needed.  Sometimes  evaluators  trv  to  please  the  t *st  : ,>onsers, 
to  the  detriment  of  the  test. 


X-H  Page  2 
1 Jul  76 


6.  Indicate  to  evaluators,  perhaps  on  the  top  of  all  questionnaires 
or  verbally,  that  they  may  make  marginal  note  clarifications 
concerning  their  scale  value  selection  for  any  rating  question. 
This  will  increase  posttest  accuracy  in  determining  questions 
which  are  scaled  awkwardly  or  unclearly  stated.  This  is 
particularly  crucial  during  the  pretesting  or  pilot  test. 

Notes  should  be  made  regarding  question  structure  immediately 
as  they  occur  to  the  evaluator  or  the  difficulty  is  likely 
to  be  forgotten. 

/.  Prior  to  having  the  evaluators  complete  questionnaires  ask  all, 
or  a few  randomly  selected  evaluators  to  verbally  describe  to 
the  other  evaluators  what  they  believe  each  question  is  asking. 
This  procedure  will  reduce  differences  between  judges  because 
of  varying  semantic  interpretations.  By  the  time  of  the  actual 
exercise,  all  evaluators  should  generally  agree,  for  example, 
on  the  meaning  of  "command  and  control  effectiveness,"  "fire 
power  potential,"  etc.  If  this  is  done,  the  criteria  will 
have  mutual  acceptance. 

8.  Evaluators  should  be  forvarned  about  biases  such  as  the  halo 
effect,  central  tendency,  and  others  discussed  in  Chapter  XII. 

If  if  is  explained  to  the  evaluator  that  these  are  common 
biases  to  which  we  are  all  sublect,  he  will  be  better  able  to 
consider  the  fairness  and  accuracy  of  his  observations. 

9.  The  independent  evaluation  of  each  question  should  be  stressed. 


I.  Other  Factors  Related  to  Questionnaire  Administration 


X-I  Page  1 
1 Jul  76 


Some  other  factors  related  to  questionnaire  administration  that 
have  not  been  discussed  in  other  sections  of  this  manual  are 
addressed  below: 

1.  Respondents  may  at  times  be  influenced  by  the  title  of  the 
questionnaire.  The  word  "test"  should  not  be  used  m a 
title  of  a questionnaire  at  it  may  imply  that  it  is  a test 
of  the  respondent's  knowledge. 

2.  A problem  with  Army  field  test  evaluations  concerns  undue 
influence  by  the  questionnaire  administrator.  It  is  sometimes 
necessary  to  use  line  officers  from  the  units  of  the  test 
subjects  as  questionnaire  administrators.  When  outside 
administrators  are  used,  they  must  be  carefully  instructed 

to  make  no  comments  whatsoever  regarding  their  personal 
opinions  of  the  items  being  evaluated.  An  offhand  comment 
by  a company  commander  administrator  to  his  company  regarding 
the  "goodness"  or  "badness"  of  a piece  of  equipment  or  concept 
being  evaluated  can  exert  an  influence  sufficient  to  distort 
the  results  significantly  from  what  they  would  otherwise  have 
been. 


3.  The  manner  in  which  test  subjects  are  selected  and  utilized 
in  operational  tests  may  affect  the  manner  in  which  they 
respond  to  questionnaire  items.  For. example,  separate  groups 
with  no  prior  experience  with  either  the  test  system  or  the 
current  standard  system  could  evaluate  each  system.  This 
would  exclude  pretest  biases,  but  test  subjects  would  have 
no  basis  to  compare  the  two  systems.  Alferriat ivelv , the 
same  group  of  test  subjects  could  use  both  systems  in 
rotation.  However,  this  procedure  mav  result  in  a hi. is  for 
or  against  one  or  both  systems  as  a function  of  which  was 
used  firsc.  In  this  respect  too,  personnel  having  extensive 
prior  experience  with  a current  standard  system  may  ii  reduce 
their  pretest  biases  for  or  against  that  system  wlun  ..  i: 
being  evaluated  against  a candidate  replacement  system.  .he 
consequence  of  such  cons  (derations  is  that  t lie*  tvj  e oi  ■;»  stem 
evaluation  intended  will  govern  the  way  evaluators  and/or 
test  subjects  are  selected  and  utilized.  The  methods  of 
selection  and  utilization  will  influence  the  way  question- 
naires must  be  designed,  and  in  turn  suggest  the  types  oi 
problems  likely  to  arise. 


XI-A  Page  1 
1 Jul  76 


Chapter  XI:  Pretesting  of  Questionnaires 


A.  Overview 


Even  the  most  careful  screening  of  a questionnaire  by  its  developer 
or  by  questionnaire  construction  experts  will  usually  not  reveal 
all  of  its  faults.  Pretesting  is  an  important  and  essential 
procedure  to  follow  before  administering  any  questionnaire.  Its 
purpose  is,  of  course,  to  find  those  overlooked  problems  and 
faults  that  would  otherwise  reduce  the  validity  of  the  information 
obtained  from  the  questionnaire  responses.  However,  just  any 
pretest  will  not  do.  One  must  know  how  to  pretest  the  items 
and  what  to  look  for. 

Some  guidelines  for  pretesting  questionnaires  are  given  in  this 
chapter.  Pretesting  may  seem  to  some  uninformed  individuals  to  be 
a waste  of  time,  especially  when  the  author  may  have  asked  several 
people  in  his  own  office  to  critique  the  questions,  or  perhaps  even 
asked  a questionnaire  specialist  to  critique  it.  However,  pretesting 
is  an  investment  that  is  well  worthwhile.  It  is  crucial  if  the 
decision  that  will  result  from  the  questionnaire  is  of  any  importance 


•jfiwlBliiriii 


dEtiriifa 


B.  Guidlines  for  Pretesting  Questionnaires 


XI-B  Page  1 
1 Jul  76 


1.  It  is  important  that  the  respondents  employed  in  pretesting 
be  representative  of  the  eventual  target  respondents.  For 
example,  if  infantry  enlisted  men  will  perform  in  a test  and 
then  take  the  questionnaire,  it  should  not  be  pretested  with 
respondents  who  are  armored  officers;  even  infantry  officers 
would  not  be  satisfactory. 

2.  The  pretest  is  more  useful  if  it  is  conducted  by  someone  who 
knows  the  operations  to  be  performed  in  the  test  and  who  also 
knows  the  subject  matter  that  the  questionnaire  covers.  It 
is  best  if  the  question  writer  himself  is  knowledgeable  about 
these  operations  and  conducts  the  pretest. 

3.  Interview  and  pretest  some  of  the  oretest  respondents  one  at 
a time.  Ask  each  respondent  to  read  each  question  and 
explain  it;?  meaning.  Also  ask  him  to  explain  the  meaning 

-of  the  response  alternatives  and  to  make  his  choice,  and  then 
ask  him  to  explain  why  he  made  his  particular  choice.  The 
respondents'  answers  will  frequently  reveal  incorrect 
assumptions  and  possible  rationales  that  the  question 
writer  never  dreamed  possible.  They  will  also  help  to 
identify  lack  of  understanding  of  particular  words,  vague 
or  ambiguous  phrases,  ill  defined  or  loaded  questions,  etc. 

A.  One  good  technique  for  pretesting  is  to  have  Uie  respondent 
read  each  question  aloud  and  then  to  tell  you  what  it  means. 
Any  difficulties  at  all  should  be  a cause  for  concern  and 
revision. 

5.  During  pretesting  the  respondents  should  be  encouraged  to 
make  marginal  notes  on  the  questionnaire  regarding  sentence 
structure,  unclear  questions  or  statements,  etc. 

6.  When  attitude  questions,  especially,  are  being  pretested, 
individuals  who  may  hold  minority  views  should  be  included. 
This  will  help  identify  loaded  questions. 

7.  Open-ended  questions  may,  and  often  should,  be  included  in 
early  pretest  versions  of  a questionnaire  in  order  to  identify 
requirements  for  additional  questions.  Pretesting  may  also 
provide  information  that  can  be  used  to  convert  open-ended 
questions  to  multiple  choice  questions  to  facilitate  data 
reduction  and  analysis. 


XI-B  Page  2 
1 Jul  76 


8.  Pretests  for  the  selection  of  verbal  anchors  are  valuable 
in  building  rating  scale  content  validity  and  reliability. 
Rather  than  employing  anchors  which  seem  appropriate,  the 
anchors  used  in  the  final  scales  should  be  selected  as  a 
result  of  analyses  of  pretests  of  respondents  similar  to 
those  who  will  be  participating  in  the  final  test. 

9.  While  pretesting  a questionnaire,  a high  proportion  of 
respondents  giving  no  response  or  a "Don't  know"  response 
should  be  a cause  for  concern.  However,  a low  number  of 
"Don't  know"  responses  (especially  for  multiple  choice 
items)  does  not  guarantee  that  the  question  is  good. 

10.  Often  more  than  one  pretest  is  needed.  At  times  question- 
naires may  have  to  go  through  six  or  more  pretests  and 
revisions. 

11.  After  pretesting,  each  question  should  be  reviewed  and  its 
inclusion  in  the  questionnaire  justified.  Questions  that 

do  not  add  significant  information  or  that  largely  duplicate 
other  questions  can  profitably  be  eliminated. 


XII-A  Page  1 
1 Jul  76 


Chapter  XII:  Characteristics  of  Respondents 

That  Influence  Questionnaire  Results 


A.  Overview 


This  chapter  discusses  some  characteristics  of  respondents  that 
influence  questionnaire  results.  It  therefore  identifies  some 
of  the  principal  sources  of  error  in  the  reporting  of  observa- 
tions and/or  the  evaluation  of  performance  in,  for  example, 
operational  Army  field  tests.  Additional  research  is  required, 
however,  to  determine  their  relative  contributions  to  error 
variance. 

Sections  XII-B  and  C,  present  a discussion  of  various 
biases,  x-esponse  sets,  or  other  sources  of  error.  There  is 
some  confusion  in  the  literature  regarding  che  use  of  these 
terms,  but  they  are  similar.  A bias  is:  a tendency  to  deviate 

from  a true  value;  a tendency  to  favor  a certain  position  cr 
conclusion;  or  an  attitude  either  for  or  against  a certain 
unproved  hypothesis  which  prevents  an  individual  from  evaluating 
the  evidence  correctly.  A response  sec  or  response  bias  refers 
to  the  tendency  of  a respondent  to  answer  questions  in  a particular 
way  almost  independent  of  the  content  of  the  questions,  /nd  an 
error  is  simply  a mistake  or  departure  from  correctness. 

Section  XII-D  addresses  the  effects  of  attitudes  of  respondents 
on  questionnaire  results,  while  Section  XII-L  considers  the  effects 
of  demographic  characteristics  on  responses. 

One  of  the  main  purposes  of  this  chapter  is  to  alert  the 
questionnaire  designer  to  some  of  the  characteristics  of  respondents 
that  influence  questionnaire  results.  There  are  ways  that  some  of 
the  biases  and  errors  can  be  controlled,  but  not  all  of  them.  And 
there  appears  to  be  no  easy  way  of  detecting  the  influence  of  a 
response  set  nor  of  neutralizing  it.  More  detailed  identification 
and  control  methods  are  areas  of  needed  further  research. 


XII -B  Page  1 
1 Jul  76 


B.  Social  Desirability  and  Acquiescence  Response  Sets 


Social  desirability  is  a response  set  where  persons  ans’  r 
according  to  the  norms  th^.y  believe  society  condones.  It  is 
the  tendency  to  agree  with  items  the  respondent  believes 
reflects  socially  desirable  attitudes  in  order  to  chow  himself 
in  a better  light.  Acquiescence  response  set  is  the  tendency 
to  consistently  agree,  to  say  "Yes,"  or  to  say  "True."  It  ic 
a general  tendency  to  assent  rather  than  dissent.  Although 
there  have  been  a number  of  studies  about  each,  a detailed 
discussion  of  them  is  beyond  the  scope  of  this  manual.  Some 
comments  about  each  are  presented  below. 


1.  Social  Desirability  Response  Set 


a.  Social  desirability  response  set  seems  to  operate  when- 
ever the  respondent  has  the  opportunity  to  respond  in 
terms  of  it.  Some  believe  that  its  effect  is  so  powerful 
that  respondents  would  not  tend  to  deviate  from  social 
norms  in  their  answers  even  though  their  behavior  denied 
what  they  said. 


b.  Several  authors  have  identified  respondents  with  a high 
social  desirability  response  rate.  They  found  these 
respondents  to  give  more  true  responses  to  neutral  items, 
to  be  more  susceptible  to  social  pressures,  to  more  likely 
be  introverts,  and  to  score  higher  on  a "lie"  scale. 


c.  Faking  or  responding  with  socially  d.siroble  answers 
which  are  not  true  is  part  of  the  response  set. 


d.  Anonymity  fails  to  eliminate  the  social  desirability 
response  set. 


e.  The  forced  choice  instrument  format  has  been  studied  for 
its  susceptibility  to  social  desirabilitv  response  set, 

a factor  it  was  intended  to  control.  Some  authors  lound 
the  forced  choice  method  minimized  the  effects  of  social 
desirability,  while  others  think  the  factor  s^iil  needs 
additional  control.  One  study  concludes  that  in  forced 
choice  formats  ambiguous  items  tend  to  be  freer  of 
social  desirability  response  set  than  positively  or 
negatively  worded  items.  T::  any  case,  the  evidence 
ind'eates  that  the  social  desirability  problem  is 
usually  less  in  forced  choice  formats  than  in  other 
item  types. 

f.  Even  card  sorts  need  control  to  eliminate  social 
desirability  bias. 


XII-B  Page  2 
1 Jul  76 


g.  Procedures  have  been  developed  for  controlling  or 
balancing  social  desirability  by  using  loaded  items 
in  the  questionnaire  and  then  adjusting  the  respondent's 
score.  The  social  desirability  score  from  the  loaded 
items  can  also  be  correlated  with  each  of  the  other 
items  on  the  questionnaire.  The  responses  on  those 
items  with  a statistically  significant  correlation  can 
then  be  corrected  by  moving  the  response  one  or  more 
steps  from  the  socially  desirable  response  to  give  c.  more 
accurate  result. 

2.  Acquiescence  Response  Set 

a.  The  acquiescence  response  set  is  defined  as  a behavioral 
attitude  by  the  respondent  to  agree  and  accept,  even  if 
he  must  alter  his  original  opinions  to  do  so. 

b.  The  acquiescence  response  set  seems  "o  operate  especially 
when  statements  are  in  the  form  of  plausible  generalities. 

c.  The  response  set  may  occur  more  with  difficult  than  with 
easy  questionnaire  material. 

d.  Acquiescence  reSDonse  set  may  be  a personality  trait. 

o.  There  is  a concern  that  social  desirability  and  acquiescence 
response  sets  may  be  related  in  such  a wav  that  an 
individual  with  a tendency  toward  conformity  will  (in- 
sistently reflect  botl  biases. 

f.  Conti  Is  for  acquiescence  response  set  have  been  researched. 
Seating  the  question  stem  in  a netural  manner  may  help 
minimise  acquiescence.  The  effects  of  acquiencence 
response  set  ,na.  a. so  be  partially  controlled  by  using 
two  alternate  questionnaire  forms  with  the  question  stated 
positively  on  half  of  the  forms  and  stated  negativelv  on 
the  other  half.  The  balancing  of  scales  (e.g.,  equal 
number  of  positive  and  negative  points)  mat  also  he  of 
value  in  counteracting  acquiescence. 


C.  Other  Response  Sets  or  Errors 


XIT-C  Fag'i  1 
1 Jul  76 


This  section  notes  a number  of  other  response  sets  or  errors 
of  which  the  questionnaire  developer  should  be  aware. 

1.  Error  of  Central  Tendency 


Pome  respondents  tend  to  avoid  endpoints  on  a scale,  and 
pick  a middle  value  regardless  of  their  true  feelings.  It 
may  be  more  common  when  the  respondent  is  not  very  familiar 
with  whatever  he  is  being  asked  to  rate.  It  may  be  counter- 
acted by  adjusting  the  strength  of  the  response  alternatives 
so  that  there  are  greater  differences  in  meaning  between 
alternatives  near  the  ends  of  the  scale  than  between 
alternatives  near  the  center. 

2 .  Extreme  Response  Set 


On  the  other  hand,  some  individuals  tend  to  consistently 
select  exaggerated  choices  for  positions.  It  can  be 
recognized  when  a respondent  makes  a pattern  of  answers 
which  tend  to  De  unevenly  distributed  toward  one  or  both 
ends  of  a scale.  Research  indicates  that  this  response 
set  may  be  a personality  characteristic. 

3.  Halo  Effect 


Halo  effect  was  originally  defined  as  a tendenev,  when  one 
is  estimating  or  rating  a person  with  respect  to  a given 
trait,  to  be  influenced  by  some  other  trait  or  by  one's 
general  impression  of  the  person.  It  is,  however,  also 
applicable  to  ratings  of  other  than  people.  For  example, 
if  a field  test  evaluator  knows  that  a particular  weapon 
svstem  did  well  in  one  phase  o."  a test,  he  may  be 
influenced  to  give  high  ratings  to  the  system  in  later  test 
phases  - ev  n v/fan  the  system  performs  poorly. 

Most  studies  ot  ways  to  control  halo  effect  have  dealt 
with  ratings  ol  traits  of  personnel  l>v  other  personnel,  a 
matter  not  of  great  concern  in  this  manual.  The  forced 
choice  technique  minimizes  halo  effect  in  some  situations. 
Ratings  will  also  be  less  distored  If  questionnaire  items 
are  constructed  so  as  to  relate  to  clearlv  observable 
aspects  of  behavior  which  do  not  overlap.  It  is  doubtful 
that  the  influence  of  hale’  elicits  can  be  comnletel” 
eliminated  from  the  responses  to  any  questionnaire. 


4. 


Leniency  Error 


XII-C  Page  2 
1 Jul  76 


Leniency  error  refers  to  a general,  constant  tendency  for  a 
rater  to  rate  either  too  high  or  too  low  in  the  direction  of 
being  too  generous.  It  appears  similar  to  halo  effect  except 
that  it  is  independent  of  the  trait  or  factor  being  rated. 
Some  raters  have  an  opposite  tendency  to  rate  too  sever ly. 

In  large  groups  of  raters  the  opposite  tendencies  should 
balance  out. 

5.  Logical  Error 

Logical  error  is  also  similar  to  halo  effect.  It  is  due  to 
the  fact  that  raters  are  likely  to  give  similar  ratings  to 
traits  or  items  that  seem  logically  related  to  them.  For 
example,  a field  test  evaluator  may  know  that  a counter- 
attack was  extremely  successful;  he  nav  therefore,  reason 
that  command  and  control  was  also  very  effective  and  should 
receive  rn  equivalent  high  evaluation  because  a successful 
counterattack  is  a function  of  good  command  and  control. 

Such  reasoning  assumes  a dependence  which  may  or  may  not  be 
true.  Logical  error  may  be  avoided  in  part  by  asking  for 
judgments  of  objectively  observable  actions  or  behavior. 

6.  Proximity  Error 

Proximity  error  occurs  when,  due  to  the  ordering  of  question- 
naire items,  the  answer  to  one  item  results  in  an  answer  to 
a subsequent  question  being  substantially  changed  from  what 
it  would  otherwise  have  been.  Little  is  known  about  its 
influence  in  field  test  situations;  most  research  in  this 
area  has  concerned  the  rating  of  personality  trait  variables. 

7 . Contrast  Error 

Contrast  error  refers  to  a tendency  for  a rater  to  rate 
others  in  the  opposite  direction  from  himself  in  recard  to 
a trait.  Little  research  has  bee  n done  on  this  source  of 
error. 

8.  Feedback  Bias 


Research  shows  that  if  observers  are  into’-med  of  experimental 
hypotheses  and  if  thev  receive  daily  feedback  indicating  how 
well  their  data  support  the  hvpotheses,  they  will  tend  to 
report  data  supporting  those  hypotheses  - even  when  the 
reverse  is  true!  This  bias  doer,  not  seem  to  occur,  however, 
when  observers  arc  informed  oily  of  the  experimental 
hypotheses  with  no  follow-up.  r;.kine  precautions  to  assure 
high  levels  of  observer  accuracy  minimizes  the  bias. 


D.  Effects  of  General  Pretest  Attitudes  of  Respondents 


XI I -D  Page  1 
1 Jul  76 


Limited  research  has  been  conducted  upon  how  the  attitudes  of 

a respondent  influence  questionnaire  results.  The  following , 

however,  should  be  noted: 

1.  Respondents  at  times  base  their  ratings  not  on  what  is  observed 
but  on  what  they  believed  prior  to  the  observation.  Beliefs 
and  opinions  may  affect  results. 

2.  It  is  generally  believed  that  judges  used  as  part  of  the 
process  of  determining  scale  values  can  rate  items  without 
being  influenced  by  their  own  attitudes.  There  is  also  some 
evidence  to  the  contrary. 

3.  Unstable  or  changing  responses  to  questionnaires  may  be  caused 
by  shifts  in  the  mood  of  the  respondent,  relative  values  among 
the  oossible  choices,  and  the  degree  of  interest  present  in 
the  question. 

4.  As  questions  become  more  ambiguous,  responses  normally  become 
more  attitudinally  based. 

5.  It  may  be  desirable  to  rev:.s2  a questionnaire  when  norms  of 
groups  differ  greatly  from  those  with  whom  the  questionnaire 
was  pretested  or  previously  administered. 


XII-E  Page  1 
1 Jul  76 


Effects  of  Demographic  Character istics  on  Responses 

Demographic  characteristics  have  been  shown  to  influence 
questionnaire  results.  Similarities  of  such  variables  among 
respondents  often  tend  to  be  related  to  a response  pattern. 

These  variables  include:  age,  religion,  s«x,  intelligence, 

marital  status,  parenthood,  socioeconomic  class,  nationality , 
urban  or  rural  residence,  income,  rank  and  experience. 
Questionnaires  should,  therefore,  be  designed  with  the  respondents 
background  in  mind.  When  there  is  a suspicion  that  demographic 
characteristics  may  affect  response,  the  data  should  be  analyzed 
by  type  of  respondent. 


XIII-A  Page  1 
1 Jul  76 


Chapter  XIII:  Evaluating  Questionnaire  Results 


A.  Overview 


An  extended  discussion  on  evaluating  questionnaire  results  is 
currently  outside  the  scope  of  this  manual  on  questionnaire 
development.  There  are,  however,  sone  factors  relating  to  the 
evaluation  of  questionnaire  results  that  should  be  noted  since 
they  may  influence  how  questionnaires  are  designed  and  developed. 
Section  XIII-B  considers  the  scoring  of  questionnaire  responses, 
and  Section  XIII-C  contains  some  notes  about  data  analyses. 


Scoring  Questionnaire  Responses 
1.  Practical  Considerations 


XIII-B  Page  1 
1 Jul  76 


a.  Both  time  ana  money  can  be  saved  by  planning  the 
questionnaire  in  line  with  scoring  and  tabulation 
requirements.  The  phrasing  of  questions  and  their 
sequencing  and  layout  affect  tabulation  time. 

b.  A decision  should  be  made  ahead  of  time  regarding 
whether  the  data  will  be  tabulated  by  hand  or  machine. 

c.  Response  alternatives  should  be  precoded  whenever 
possible. 

d.  Since  it  does  not  seem  to  matter  if  items  are  scrambled 
or  in  blocks  according  to  content,  blocking  may  be  pre- 
ferred due  to  greater  hand  scoring  case. 

e.  See  Section  IX-E  regarding  the  use  of  answer  sheets. 

2.  Other  Considerations 


a.  There  may  be  a justification  for  scoring  rating  scale 
items  dichotomously  according  to  the  direction  of 
response.  It  is  sometimes  dene  when  bipolar  scales  are 
analyzed  in  terms  of  the  proportion  of  responses  in 
either  direction  of  the  bauic  dichotomy.  The  justifi- 
cation is  based  upon  results  that  seem  to  indicate  that 
composite  scores  reflect  primarily  the  direction  of 
responses  and  only  to  a minor  extent  their  intensities. 

b.  One  investigator  found  that  many  Likert-type  rating 
scales  consisting  of  2 through  19  steps  may  be 
collapsed  into  two  or  three  measurement  categoties 
for  analysis  with  no  lack  of  precision. 

c.  When  working  with  paired  comparison  items  with  a "No 
preference"  option,  the  "No  preference"  responses  can 
often  be  either  divided  proportionote  to  the  preference 
responses,  or  disregarded  altogether.  The  basis  for  this 
suggestion  is  that  respondents  who  claim  neutrality  appear 
to  exhibit  the  same  preference  patterns  as  those  who 
express  a profere;ice  . 


XIII-B  Page  2 
1 Jul  76 


d.  By  using  any  one  of  several  methods  of  scoring  or 
transforming  self-rating  scale  raw  scores,  it  is 
usually  possiole  tn  appriximate  dyadic  forced  choice 
results  with  considerable  saving  in  administration 
time,  and  a small  gain  in  test-retest  reliability. 

e.  The  concurrent  validity  of  questionnaires  may  b_  somewhat 
increased  by  using  item  weights  obtained  by  expert 
scaling  instead  of  conventional  unit  weights,  but  it  may 
not  be  worth  the  efrort. 

f.  Investigators  sometimes  use  intensity  scores  as  well  as 
rating  scale  content  scores.  One  way  of  obtaining  an 
intensity  score  is  to  follow  each  question  with  the 
query  "How  strongly  do  you  feel  about  this?"  A second 
way  involves  weighting  extreme  responses  (positive  and 
negative)  as  2,  moderate  responses  as  1,  and  neutral 
responses  as  0.  These  weights  car.  then  be  summed  for 
an  intensity  score. 


XIII-C  Page  1 
1 Jul  76 


C.  Data  Analyses 

A detailed  discussion  of  data  analysis  is  beyond  the  scope  of 
this  manual;  however,  some  basic  data  analysis  issues  have  been 
mentioned  in  related  chapters.  Additionally,  the  following 
points  are  also  noted: 

1.  Analyses  of  questionnaire  responses  is  chiefly  of  two  types: 
summary  tabulations  and  statistical  analyses.  Tabulations 
are  used  primarily  for  the  presentation  of  results. 
Statistical  tests  are  used  to  determine  whether  the  dif- 
ferences in  the  results  are  significant.  Statistical 
literature  is  available  which  presents  numerous  tes“.s 
usable  in  such  analyses. 

2.  As  part  of  the  questionnaire  development  process,  tentative 
(dummy)  analysis  tables  should  be  developed  to  assure  that 
the  data  to  be  obtained  are  appropriate. 

3.  Four  kinds  of  measurement  scales  have  been  identified: 
nominal,  ordinal,  interval,  and  ratio.  Appropriate 
statistical  analyses  are  associated  with  each.  Hence, 

the  data  analysis  limitations  of  various  forms  of  question- 
naires should  be  considered  before  an  instrument  is 
designed.  For  example,  less  can  be  done  statistically 
with  open-ended  questions  than  with  ranking  questions. 


XIV-A  Page  1 
1 Jul  76 


Chapter  XIV:  Interview  Considerations 


A.  Overview 


If  properly  used,  the  interview  is  an  effective  means  of  obtaining 
data.  It  is  a technique  in  which  an  individual  is  questioned  by 
a skilled  and  trained  interviewer  who  records  all  replies,  prefer- 
ably verbatim  in  most  cases.  Most  of  the  principals  of  question- 
naire construction  discussed  in  previous  chapters  pertain  to  the 
interview  as  well.  This  chapter,  however,  notes  some  issues 
specifically  related  to  interviews. 

Section  XIV-B  presents  the  distinction  between  structured  and 
unstructured  interviews.  Interviewer's  characteristics  relative 
to  tha  interviewee  are  noted  in  Section  XIV-C.  Situational 
factors  are  noted  in  Section  XIV-D,  while  the  topics  of 
Sections  XIV-E,  F,  and  G are,  respectively,  training  interviewers, 
data  recording  and  reduction,  and  special  problems.  There  is, 
unfortunately,  little  that  can  be  recommended  to  avoid  some  of 
the  problems  noted  in  this  chapter.  The  questionnaire  developer 
should,  in  any  case,  be  aware  of  them. 


B.  Structured  and  Unstructured  Interviews 


XIV-B  Page  1 
1 Jul  76 


i 


The  term  "structured"  when  applied  to  interviews  is  intended  to 
emphasize  that  the  interviewer  employs  a script  of  all  the  ques- 
tions to  be  asked.  In  the  unstructured  interview  the  inter- 
viewer may  know  many  of  the  topics  to  be  covered  but  needs  to 
learn  more  about  the  subject  overall,  so  he  is  willing  to  be 
led  by  the  interviewee  even  into  digressions.  Unstructured 
interviews  may  occur  as  a preliminary  to  preparing  either  a 
questionnaire  or  a structured  interview  script.  One  could  use 
a questionnaire  as  the  script  for  a structured  interview  if  jie 
already  had  the  questionnaire  developed,  but  not  enough  time  to 
convert  it  to  a more  convenient  format.  The  main  difference 
between  the  structured  interview  and  questionnaire  is  procedural. 

The  degree  of  proficiency  required  of  interviewers  in  con- 
ducting an  unstructured  interview  is  generally  not  available 
during  Army  field  test  evaluations.  A structured  interview 
requires  the  interviewer  to  have  only  moderate  ski  11  and  pro- 
ficiency, and  hence  is  usuallv  preferred.  The  advantages  of 
the  structured  interview  include:  the  opportunity  to  probe 

for  all  the  facts  when  the  responuent  gives  only  a partial  or 
incomplete  response;  a chance  to  insure  that  the  question  is 
thoroughly  understood  by  the  respondent;  and  an  opportunity  to 
pursue  other  problem  areas  which  may  arise  during  an  interview. 
The  strucutred  interview  is  almost  always  preferable  to  a 
questionnaire  when  the  test  group  is  small  (10  to  20),  and  when 
time  and  test  conditions  permit. 

As  noted  in  Section  Tl-B,  unstructured  interviews  are  not 
included  within  the  defiui  ion  of  questionnaire  used  in  this 
manual.  They  are,  therefore,  not  discussed  further. 


c. 


Interviewer's  Characteristics  Relative  to  Interviewee 


XIV-C  Page  1 
1 Jul  76 


More  research  is  needed  to  identify  how  characteristics  of  an 
interviewer  affect  the  respondent.  Some  areas  of  concern  are 
presented  below. 

1 .  Rank,  Grade  or  Status  of  the  Interviewer 


For  Army  field  test  evaluations  it  is  recommended  that  the 
interviewer  should  be  of  similar  rank  or  grade  to  the 
individuals  being  interviewed.  A difference  in  rank  or 
grade  introduces  a bias  ir.  the  data  which  has  been  found 
to  substantially  influence  test  results.  Interviewees 
tend  to  give  the  answer  they  perceive  the  higher  ranking 
interviewer  favors.  When  the  interviewer  is  of  lower 
grade,  the  interviewee  may  not  show  respect  and  may  not 
cooperate. 

Evidence  indicates  that  the  greater  the  disparity 
between  the  status  of  the  interviewer  and  that  of  the 
respondent,  the  greater  the  tendency  for  biased  responses. 
The  respondent  tends  to  answer  favorably  in  the  eves  of 
the  more  serious  interviewer. 

Data  suggest  that  in  the  interview  situation  the 
respondent  tends  to  support  the  norms  adhered  to  by  the 
interviewer.  Lower  socioeconomic  respondents  mav  defer  to 
the  norms  represented  by  a higher  status  interviewer.  The 
effect,  however,  is  related  to  the  types  of  questions  asked. 
Sensitive  issues  involving  socially  accepted  or  rejected 
answers  will  effect  more  bias. 

2 . Sex  of  the  Interviewer 

Differences  in  response  patterns  according  to  the  inter- 
viewer's sex  depend  on  subject  matter  as  ell  as  on  the 
composition  of  the  respondent  populations  ar.d  other 
characteristics  of  the  specific  survey  situation. 

3.  Race  oi  the  Interviewer 


The  effects  of  the  race  of  the  interviewer  on  the  respondent 
should  probably  be  viewed  as  the  result  oi  interaction 
between  interviewer  and  respondent  characteristics. 
Respondents  often  give  socially  desirable  answers  to  inter- 
viewers whose  race  diifers  from  theirs,  particularly  if  the 
interviewee's  social  status  is  lower  then  that  of  the  inter- 
viewer and  the  topic  of  the  question  is  threatening. 


I 


XIV-C  Page  2 
1 J-.il  76 

However,  an  interviewer's  race  can  probaly  establish 
different  fratf.es  of  reference  even  in  nonsensitive  areas. 

Particularly  in  regard  to  social  issues,  more  valid  results 

can  be  expected  when  the  interviewer  is  of  the  same  race  as  ; 

the  respondent. 

4.  Experience  of  the  Interviewer 

It  has  been  reported  that  there  may  be  no  significant  dif- 
ferences between  interview  completion  rates  for  experienced 
and  inexperienced  interviewers,  and  that  the  training  and 
experience  of  the  interviewer  has  no  effect  on  the  number 
of  deviations  they  made  from  the  instructions.  However, 
regarding  quality  of  interviews,  all  interviewers  improve 
with  experience. 


D.  Situational  Factors 


XIV-D  Page  1 
1 Jul  76 


Among  the  situational  factors  that  should  be  considered  when 

interviews  are  used  are  the  following: 

1.  It  helps  greatly  if  the  interviewee  perceives  the  interviewer 
as  interested  in  hearing  his  comments,  as  willing  to  listen, 
and  (if  the  situation  requires)  as  willing  to  protect  him 
from  recrimination  for  being  adverse  in  his  evaluations. 

2.  Interviews  should  be  conducted  in  a quite,  temperature 
controlled  environment  where  the  respondent  can  be  comfortable 
and  relaxed.  Each  respondent  should  be  interviewed  in  private, 
separate  and  apart  from  all  others  so  that  no  other  person 
hears  or  is  biased  by  his  responses. 

3.  The  reinforcing  behaviors  of  the  interviewer  have  an  influence 
on  the  responses  collected,  and  at  times  may  cause  a respondent 
to  change  his  preferences.  Such  comments  as  "good"  or  "fine" 
and  such  actions  as  smiling  and  nodding  can  have  a decided 
effect  on  test  results.  Praised  respondents  normally  offer 
more  answers  than  unpraised  ones.  Praising  respondents  may 
also  tend  to  reduce  "Don’t  know"  answers  without  increasing 
insincere  or  dishonest  responses. 

4.  Interested  respondents  seem  to  be  more  subject  to  interviewer 
effects  than  uninterested  ones. 


XIV-E  Page  1 
1 Jul  76 


E.  Training  Interviewers 

Generally,  interviewers  require  a certain  amount  of  training. 
Such  a discussion,  however,  is  outside  the  scope  of  the  initial 
version  of  this  manual.  Army  personnel  may  check  with  the  Army 
Research  Institute-Field  Unit  closest  to  them  for  help  in  this 
area. 


XIV-F  Page  1 
1 .Tul  76 


F.  Data  Recording  and  Reduction 

In  the  structured  interview  both  questions  and  answers  are 
orally  communicated.  The  interviewer  may  encode  the  answers 
on  paper,  or  tape  record  the  responses  for  later  encoding 
(but  only  if  the  interviewee  agrees  to  the  taping  and  does 
not  seem  influenced  by  the  presence  of  a recording  device) . 

Other  topics  related  to  interview  data  recording  and 
reduction  are  outside  the  scope  of  the  initial  version  of  this 
manual . 


J 


I'^^^^jfwrajKWTOWrF''' 


I 

| 

XIV-G  Page  1 
1 Jul  76 

G.  Special  Interviewer  Problems 

This  section  notes  some  special  problems  related  to  interviews. 

When  interviews  are  used,  the  qualified  interviewer  will 
avoid  leading,  pressuring,  or  influencing  the  direction  of  an 
interviewee's  evaluations.  If  a potential  interviewer  has 
strong  preferences  regarding  the  system(s)  being  tested,  he 
should  probably  be  disqualified. 

Many  studies  have  been  conducted  that  show  other  biasing 
effects  on  the  interviewer.  Factors  leading  to  significant 
effects  of  the  interviewer  upon  results  include:  relatively 

high  ambiguity  in  the  concept  of  wording  of  the  inquiry;  the 
Interviewer  "resistance"  to  a given  question;  and  additional 
questioning  or  probing.  Interviewer  bias  can  exist  without 
being  apparent,  and  the  direction  of  bias  is  not  necessarily 
uniform.  The  least  interviewer  bias  is  probably  found  with 
questions  that  can  be  answered  "Yes"  or  "No."  The  bias  can 
result  from  differences  in  interviewing  methods,  differences 
in  the  degree  of  success  in  eliciting  factual  information, 
and  differences  in  classifying  the  respondent's  answers.  An 
interviewer's  expectations  may  have  a more  powerful  effect  on 
the  results  than  his  ideological  preferences. 

Some  interviewers  have  a tendency  not  to  transmit  printed 
instructions  word  for  word.  Hence  total  phrases  may  be 
eliminated  and  key  words  originally  intended  to  focus  the 
respondent's  attention  on  some  specific  point  are  omitted  or 
changed.  Key  ideas  are  lost,  mainly  through  omission. 

Variability  of  interviewer  performance  seems  to  vary  both 
across  interviewers  and  within  individuals. 

An  interviewer's  attitude  toward  a question  can  conmunicate 
itself  sufficiently  to  the  respondent  so  that  the  meaning  of 
the  question  is  altered.  Hence  the  nature  of  the  survey  and 
tho  survey  organization  are  determining  factors  in  whether  or 
not  the  interviewer  must  follow  the  interview  schedule  verbatim, 
or  may  vary  the  wording. 


Army  Project  Number 
2Q763731A775 


TCATA 

DAHC19-74-C-GQ32 


QUESTIONNAIRE  CONSTRUCTION  MANUAL 
ANNEX 

Dr.  Robert  F.  Dyer 
Josephine  J.  Matthews 
Josef  F.  Stulac 
Dr.  Calvin  E.  Wright 
Dr.  Kenneth  Yudowitch 
Operations  Research  Associates 


Submitted  by: 

George  M.  Cividen,  Chief 
Fort  Hood  Field  Unit 


July  1976 


Approved  by: 


Joseph  Zeidner,  Director 
Organizations  and  Systems 
Research  Laboratory 


J.  E.  'Jhlaner,  Technical  Director 
U.S.  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences 


